Our servers suffer massive down-score - and I don't have the slightest clue why

ChrisW · April 30, 2018, 9:11pm

Hi

we’ve been running a pair of NTP servers for the pool for several years now. One of them high load (2 Gb setting, ~6 Mbps average) the second lightly loaded (and basically supposed to be switched in should #1 fail)

These boxes have been up 24/7 for several years now, in a fully redundant data center with pretty good connectivity, and with standard operational monitoring.

So imagine my surprise when Ask’s robot emailed me this morning telling me that Server #1 has been removed from pool because of low score. First I assumed the box had crashed - but it hadn’t. It was up (> 1000 days uptime right now), ntpd was up and serving. Synchronization was ok too - it synchronized to a DCF77 Stratum1 box ~200 km north, and had a GPS based Stratum1 ~200km southeast as candidate. Both those Stratum1s are reliable Meinberg boxes.

And it was claimed to have negative -13.3 score. Checking on the other, it also was scored pretty badly, plus 11.something (and has since fallen to 4.8)

Being completely out of ideas I added another external stratum1 source and restarted ntpd on both boxes. And while box 1 is now very slowly creeping back top the 0 line, box 2 has since fallen way below the acceptability threshhold…

Another thing I see is that they are monitored from the US West Coast. Both of these boxes are located in Central Europe (Frankfurt, Germany) - could it be that we are seeing here US west coast connectivity problems, not those of my boxes?

See for yourself:
http://www.pool.ntp.org/scores/195.50.171.101
http://www.pool.ntp.org/scores/195.50.171.102

Any idea? What can I do?

marki · May 2, 2018, 7:40am

Seems like temporary network issue. Right now your NTP server is available worldwide - https://atlas.ripe.net/measurements/12444930/#!probes

curbynet · May 2, 2018, 2:44pm

Perhaps it’s related to this earlier issue?

AlisonW · May 2, 2018, 5:19pm

Very much so, yes. My server* is connected by IPv4 and (native) IPv6 so, obviously, holds exactly the same time yet the graph is regularly different.

UK, dedicated machine.

ChrisW · May 4, 2018, 5:58pm

It started again. Box fell down to 4.6 this afternoon and is now slowly climbing back (now at 8.7)
Are the Californian monitoring station affected by weekend traffic overload?

What is their IP?
What is their connectivity? Carrier? AS? Anything?

And why is there no monitoring from Europe?

Hedberg · May 4, 2018, 7:11pm

I have 3 servers on 2 different ISP’s here in Denmark and they are all at 20 and has been so for days, so it is not a general problem.

ChrisW · May 7, 2018, 4:35pm

I am sorry guys, but this issue is still going on and I’m not closer to any solution.

Would somebody please answer my question? What are the IPs of the monitoring systems ? What is their connectivity? Carrier?
I need this to have our peering people look into that.

ChrisW · May 11, 2018, 9:37am

Hello Hedberg,

as your “answer” seems to have effectively smothered any further discussion here, I have to state that this is of course a non-sequitur. The internet doesn’t work that way. Just because one place/ISP in Denmark has good connectivity to the US west coast monitoring stations has barely any implication on the connectivity of others there - or in neighboring countries.

And I sill need the IP addresses of the monitoring stations.

mlichvar · May 11, 2018, 12:07pm

You can find the address of the LA monitoring station in this thread:
https://community.ntppool.org/t/problems-with-the-los-angeles-ipv4-monitoring-station

Other stations are running in the new beta pool:
https://web.beta.grundclock.com

ask · May 16, 2018, 3:06am

Also, network debugging information here: https://dev.ntppool.org/monitoring/network-debugging/

And yes, the beta site has more monitors and more details in the monitoring logs.

Topic		Replies	Views
Server score keeps dropping Server operators monitoring	14	2277	April 19, 2019
Score/network woes Server operators monitoring	71	7544	March 7, 2019
Monitoring station seems to hate my server all of a sudden Server operators	35	4359	April 8, 2019
List of Monitoring IPs? Server operators	18	301	January 2, 2025
Monitoring stations timeout to our NTP servers Server operators	103	8943	May 22, 2021

Our servers suffer massive down-score - and I don't have the slightest clue why

Related topics