Additional monitoring servers (help wanted)

Hi server folks,

I’m closer to ready to adding more monitoring systems to the Pool. Depending on how many we have an early iteration would be to just have the system choose the closer 1-3 monitoring servers to monitor your server; later operators could “choose their own” and eventually maybe the computer can help pick in a smarter way.

I’m unconvinced how this will improve the quality of the pool service, but I don’t think it will make it worse and it might increase the available capacity. It definitely should help the frustrating experience of the pool server operators when “random internet noise” makes their server appear not working.

So… Most of the servers around the world I have access to are virtual machines or otherwise not clearly great monitoring servers.

Does any of you have a system that’s either or both of:

  • not virtualized (and/or good at keeping time with ntpd or chronyd).
  • close to or itself a high quality time service (next to a stratum 1 NTP or PTP source)

The monitoring daemon is a small Go program sending maybe a dozen NTP queries per second. It can run on a regular user account.

3 Likes

This is great news! I think it will really help out with some locations keeping them scored in the pool. Or maybe we will discover some more widespread networking issues around the world, who knows… lol

I would think you would also want/ require dual-stack networking?

when you say “close” to a stratum 1 server, how close is “close”… I use one at a university about 73 miles away in Manhattan

It’s more about “reliable and consistent”, so ideally on another system right next to it or across the data center / campus.

The local internet connection should be reliable, have symmetric latency, generally symmetric routing and never be congested.

Also, what Os does the monitoring software use?

Any variant of Linux or FreeBSD should work.

What is your program’s memory footprint? I don’t know if my weak 103.226.213.30 would qualify… Poor Celeron 1Ghz with GPS PPS source and 100M/100M FTTH connection. I won’t mind if I have to reduce its pool bandwidth to 384k to save CPU time for monitoring.

1 Like

I can offer either/or:

  1. e.g. arm64 dedicated physical node (e.g. Pine64, Rock64, Go compiles just fine) with an industrial SD card
  2. amd64 virtual host which can keep good time, and isn’t on an overloaded physical node (e.g. could live on our own NFV cluster, rather than customer VM pool)

…either in a datacentre in Geneva with 1 hop to leontp.g.faelix.net (stratum-1 LeoNTP of ours); or in an Equinix DC in the UK with peering across to LINX who provide stratum-1 sources.

Re: “unconvinced how this will improve the quality of the pool service” I suspect it won’t improve the pool but will improve the feel-good-factor of us server suppliers as we’ll feel our stats are more meaningful. eg. My server (physical, solely does the NTP thing) north of London, UK, is monitored from across an ocean and a continent so, unsurprisingly, the variation in how its timekeeping is monitored can vary quite a bit for reasons unrelated to either the box or the local connection. The most notable proof of that is the comparison between IPv4 and IPv6, where one box with native connections for each protocol gets much greater variation on the former than the latter.

I’m in Australia. I’m keen to help. My servers at ntp.polyfoam.com.au and ntp.icemoonprison.com in Melbourne frequently fall off the pool because the monitoring from Los Angeles is losing packets (for reasons unknown). Beta monitoring from Zurich produces much better scores, which makes me think that it’s not my servers per se, but the monitoring.

My NTP server at ntp.polyfoam.com.au is a Raspberry Pi with a PPS GPS hat. My network connection is 100 Mbit symmetric, native IPv4 and IPv6 dual stack. If the Pi can run the monitoring software directly then I can start any time. Otherwise I will need to install a physical machine in the server rack, which may take some days.

1 Like

I should be able to assist.

If I read correctly this will initially be for localised monitoring with the aim for broader monitoring?

Hello,

we would like to help you out.
We are located in Frankfurt am Main and have direct peerings to most of the biggest upstreamers also a very good latency.

If you are interested - please write me a PM.

I can offer access to one LP machine act as stratum 1 with Meinberg GPS169PCI GPS Receiver.
This is time quality at the moment:

root@pve:/proc# ntpq -p
remote refid st t when poll reach delay offset jitter

*SHM(0) .GPSs. 0 l 14 16 377 0.000 -0.002 0.000

I’ve got a server in the same DC as a Strat 1 and appears to be able to follow very closely although a few days ago, the strat 1 went out of sync by 75ms which was odd, I noticed a number of other servers go out by 75ms at that time too so it appeared so be somewhat wide spread. I also have a server in Germany that is not as close to a strat 1 but appears to follow much closer, being within less than 5ms as measured by your current monitoring system. They are both virtualized but appear do be very good at tracking.

@ask
My pool servers link are all in the same DC as a stratum-1 just a single hop away from the stratum-1. They’re baremetal racked servers in very well-connected datacenters on both U.S. coasts, Europe, and Asia. I’d be happy to run your monitoring daemon on them.

3 Likes

Thanks for providing solid servers in Taiwan! But the ipv4 address of t1.time.tw1.yahoo.com and t2.time.tw1.yahoo.com were misconfigured by the pool system as from Singapore. Hope this can be fixed.

1 Like

My system is virtualized (hosted by Aliyun AS37963) and 50ms away from a stratum 1 source. But I think it is keeping good time, chronyc tracking shows the following.

chronyc tracking
Reference ID : 202.118.1.46 (ntp.neu.edu.cn)
Stratum : 2
Ref time (UTC) : Sun Mar 3 22:03:29 2019
System time : 0.000008752 seconds slow of NTP time
Last offset : -0.000003167 seconds
RMS offset : 0.000022342 seconds
Frequency : 0.001 ppm slow
Residual freq : -0.000 ppm
Skew : 0.004 ppm
Root delay : 0.050510 seconds
Root dispersion : 0.000484 seconds
Update interval : 1027.5 seconds
Leap status : Normal

And this system is within China also, so maybe it can help monitoring CN servers at least better than nothing. You might want to give it a try until a owner of real servers in China shown up.

Im 0.6 ms away from a two statum 1 servers. Its connected with 8x1 Gbit/s links.
I can run a monitor VM in Sweden if that would help.

1 Like

We are interested in running this at the Finnish UTC-lab, VTT MIKES. Let us know how to proceed :slight_smile:

2 Likes

If you consider kvm well enough to keep time, then you can feel free to run it on montreal.ca.logiplex.net or logiplex.net

How much memory does it use? As far as I know, Go is an efficient language. I don’t know much about it but I hung in the channel on freenode some while and seem to have gathered that

The first is at Ovh while the latter is at vultr/choopa which is primarily NTT

I could also donate one of my two VPS in Hong Kong that are paid for the next year and dedicate it to the purpose, you can actually have root if you’d like because all they do is ntp and I don’t use them for anything else., but they are openvz they do keep really good time, an ntp test from https://servertest.online/ntp showed in the millionths of a second for one of them recently

They are all configured with the nearest public stratum 1

Noah