Score/network woes

monitoring

#1

So my servers, like apparently many others, are suffering from bouncing scores quite regularly for the last several weeks. My pair of pool servers are hosted on pretty decent circuits (multiple, BGP) and I’ve never seen problems like this that persist more than tiny blips. Also of note, the servers are both in Northern California. I also ran an MTR for the last 24 hours and have 0% loss to ntplax7 and 1ms standard deviation, yet my scores are currently ~5.

I’ve put one of them into the new monitoring and it’s seeing similar issues (and scores) but only from the Los Angeles monitoring station.

So, Ask, have you explored the possibility that Phyber is having issues (or a direct upstream of them)? That seems most likely to me at this point.


#2

For reference:
https://web.beta.grundclock.com/scores/198.169.208.142/log?limit=200&monitor=*


#3

Yeah, there’ve been some other similar threads, and my servers are affected as well. I wonder what constitutes a “failure” or red dot on the graphs … does that imply a complete failure to connect, some out-of-spec data point (e.g. network latency, offset, etc.), a combination of factors, or something else?


#4

As far as I know a red dot (big) can mean some network error (unable to retrieve the time from the ntp server in question). It’s UDP after all.

A red dot (small/big) can also mean wrong time (offset) by some threshold (I think it’s around 100ms) from the monitoring server’s point of view.

There is also an orange dot (small) if the time is not quite right but not enough to mark it red.


Below the graph is a link “What do the graphs mean?”

The Score graph
A couple of times an hour the pool system checks the time from your server and compares it to the local time. Points are deducted if the server can’t be reached or if the time offset is more than 100ms (as measured from the monitoring systems). More points are deducted the bigger the offset is.

The graph is only meant as a tool to visualize trends. For more exact details of what the monitoring system found you can click on the CSV link.

The Offset graph
The monitoring system works roughly like an SNTP (RFC 2030) client, so it is more susceptible by random network latencies between the server and the monitoring system than a regular ntpd server would be.

The monitoring system can be inaccurate as much as 10ms or more.