Additional monitoring servers (help wanted)

ask · December 8, 2018, 7:52pm

Hi server folks,

I’m closer to ready to adding more monitoring systems to the Pool. Depending on how many we have an early iteration would be to just have the system choose the closer 1-3 monitoring servers to monitor your server; later operators could “choose their own” and eventually maybe the computer can help pick in a smarter way.

I’m unconvinced how this will improve the quality of the pool service, but I don’t think it will make it worse and it might increase the available capacity. It definitely should help the frustrating experience of the pool server operators when “random internet noise” makes their server appear not working.

So… Most of the servers around the world I have access to are virtual machines or otherwise not clearly great monitoring servers.

Does any of you have a system that’s either or both of:

not virtualized (and/or good at keeping time with ntpd or chronyd).
close to or itself a high quality time service (next to a stratum 1 NTP or PTP source)

The monitoring daemon is a small Go program sending maybe a dozen NTP queries per second. It can run on a regular user account.

littlejason99 · December 8, 2018, 8:00pm

This is great news! I think it will really help out with some locations keeping them scored in the pool. Or maybe we will discover some more widespread networking issues around the world, who knows… lol

I would think you would also want/ require dual-stack networking?

W2AIQ · December 8, 2018, 11:02pm

when you say “close” to a stratum 1 server, how close is “close”… I use one at a university about 73 miles away in Manhattan

ask · December 9, 2018, 1:34am

It’s more about “reliable and consistent”, so ideally on another system right next to it or across the data center / campus.

The local internet connection should be reliable, have symmetric latency, generally symmetric routing and never be congested.

W2AIQ · December 9, 2018, 1:52am

Also, what Os does the monitoring software use?

ask · December 9, 2018, 2:04am

Any variant of Linux or FreeBSD should work.

alica · December 9, 2018, 6:12am

What is your program’s memory footprint? I don’t know if my weak 103.226.213.30 would qualify… Poor Celeron 1Ghz with GPS PPS source and 100M/100M FTTH connection. I won’t mind if I have to reduce its pool bandwidth to 384k to save CPU time for monitoring.

faelix · December 9, 2018, 8:52am

I can offer either/or:

e.g. arm64 dedicated physical node (e.g. Pine64, Rock64, Go compiles just fine) with an industrial SD card
amd64 virtual host which can keep good time, and isn’t on an overloaded physical node (e.g. could live on our own NFV cluster, rather than customer VM pool)

…either in a datacentre in Geneva with 1 hop to leontp.g.faelix.net (stratum-1 LeoNTP of ours); or in an Equinix DC in the UK with peering across to LINX who provide stratum-1 sources.

AlisonW · December 9, 2018, 6:30pm

Re: “unconvinced how this will improve the quality of the pool service” I suspect it won’t improve the pool but will improve the feel-good-factor of us server suppliers as we’ll feel our stats are more meaningful. eg. My server (physical, solely does the NTP thing) north of London, UK, is monitored from across an ocean and a continent so, unsurprisingly, the variation in how its timekeeping is monitored can vary quite a bit for reasons unrelated to either the box or the local connection. The most notable proof of that is the comparison between IPv4 and IPv6, where one box with native connections for each protocol gets much greater variation on the former than the latter.

debbiep · December 10, 2018, 12:01am

I’m in Australia. I’m keen to help. My servers at ntp.polyfoam.com.au and ntp.icemoonprison.com in Melbourne frequently fall off the pool because the monitoring from Los Angeles is losing packets (for reasons unknown). Beta monitoring from Zurich produces much better scores, which makes me think that it’s not my servers per se, but the monitoring.

My NTP server at ntp.polyfoam.com.au is a Raspberry Pi with a PPS GPS hat. My network connection is 100 Mbit symmetric, native IPv4 and IPv6 dual stack. If the Pi can run the monitoring software directly then I can start any time. Otherwise I will need to install a physical machine in the server rack, which may take some days.

bjohns · December 10, 2018, 4:07am

I should be able to assist.

If I read correctly this will initially be for localised monitoring with the aim for broader monitoring?

Knot3n · December 12, 2018, 7:18am

Hello,

we would like to help you out.
We are located in Frankfurt am Main and have direct peerings to most of the biggest upstreamers also a very good latency.

If you are interested - please write me a PM.

nikolay · December 28, 2018, 9:48pm

I can offer access to one LP machine act as stratum 1 with Meinberg GPS169PCI GPS Receiver.
This is time quality at the moment:

root@pve:/proc# ntpq -p
remote refid st t when poll reach delay offset jitter

*SHM(0) .GPSs. 0 l 14 16 377 0.000 -0.002 0.000

Bryce · January 3, 2019, 12:59am

I’ve got a server in the same DC as a Strat 1 and appears to be able to follow very closely although a few days ago, the strat 1 went out of sync by 75ms which was odd, I noticed a number of other servers go out by 75ms at that time too so it appeared so be somewhat wide spread. I also have a server in Germany that is not as close to a strat 1 but appears to follow much closer, being within less than 5ms as measured by your current monitoring system. They are both virtualized but appear do be very good at tracking.

kboling · February 6, 2019, 8:49pm

@ask
My pool servers link are all in the same DC as a stratum-1 just a single hop away from the stratum-1. They’re baremetal racked servers in very well-connected datacenters on both U.S. coasts, Europe, and Asia. I’d be happy to run your monitoring daemon on them.

alica · February 7, 2019, 9:39am

Thanks for providing solid servers in Taiwan! But the ipv4 address of t1.time.tw1.yahoo.com and t2.time.tw1.yahoo.com were misconfigured by the pool system as from Singapore. Hope this can be fixed.

CHL · March 3, 2019, 10:21pm

My system is virtualized (hosted by Aliyun AS37963) and 50ms away from a stratum 1 source. But I think it is keeping good time, chronyc tracking shows the following.

chronyc tracking
Reference ID : 202.118.1.46 (ntp.neu.edu.cn)
Stratum : 2
Ref time (UTC) : Sun Mar 3 22:03:29 2019
System time : 0.000008752 seconds slow of NTP time
Last offset : -0.000003167 seconds
RMS offset : 0.000022342 seconds
Frequency : 0.001 ppm slow
Residual freq : -0.000 ppm
Skew : 0.004 ppm
Root delay : 0.050510 seconds
Root dispersion : 0.000484 seconds
Update interval : 1027.5 seconds
Leap status : Normal

And this system is within China also, so maybe it can help monitoring CN servers at least better than nothing. You might want to give it a try until a owner of real servers in China shown up.

iocc · March 8, 2019, 12:34pm

Im 0.6 ms away from a two statum 1 servers. Its connected with 8x1 Gbit/s links.
I can run a monitor VM in Sweden if that would help.

anders.e.e.wallin · March 25, 2019, 7:36am

We are interested in running this at the Finnish UTC-lab, VTT MIKES. Let us know how to proceed

NoahMcNallie · March 27, 2019, 4:21am

If you consider kvm well enough to keep time, then you can feel free to run it on montreal.ca.logiplex.net or logiplex.net

How much memory does it use? As far as I know, Go is an efficient language. I don’t know much about it but I hung in the channel on freenode some while and seem to have gathered that

The first is at Ovh while the latter is at vultr/choopa which is primarily NTT

I could also donate one of my two VPS in Hong Kong that are paid for the next year and dedicate it to the purpose, you can actually have root if you’d like because all they do is ntp and I don’t use them for anything else., but they are openvz they do keep really good time, an ntp test from https://servertest.online/ntp showed in the millionths of a second for one of them recently

They are all configured with the nearest public stratum 1

Noah

Topic		Replies	Views
Suggestions for monitors, as Newark fails a lot and the scores are dropped too quickly Server operators monitoring	91	4061	August 2, 2021
Monitoring stations timeout to our NTP servers Server operators	103	8301	May 22, 2021
Adding New Monitor Pool Development monitoring	18	1503	April 17, 2024
No PR action for project and dramatic reduction in the number of active servers	61	2562	January 22, 2020
Score/network woes Server operators monitoring	71	6954	March 7, 2019

Additional monitoring servers (help wanted)

root@pve:/proc# ntpq -p remote refid st t when poll reach delay offset jitter

Related topics

root@pve:/proc# ntpq -p
remote refid st t when poll reach delay offset jitter