Additional monitoring servers (help wanted)

#12

Hello,

we 23media.com would like to help you out.
We are located in Frankfurt am Main and have direct peerings to most of the biggest upstreamers also a very good latency.

If you are interested - please write me a PM.

#13

I can offer access to one LP machine act as stratum 1 with Meinberg GPS169PCI GPS Receiver.
This is time quality at the moment:

root@pve:/proc# ntpq -p
remote refid st t when poll reach delay offset jitter

*SHM(0) .GPSs. 0 l 14 16 377 0.000 -0.002 0.000

#14

I’ve got a server in the same DC as a Strat 1 and appears to be able to follow very closely although a few days ago, the strat 1 went out of sync by 75ms which was odd, I noticed a number of other servers go out by 75ms at that time too so it appeared so be somewhat wide spread. I also have a server in Germany that is not as close to a strat 1 but appears to follow much closer, being within less than 5ms as measured by your current monitoring system. They are both virtualized but appear do be very good at tracking.

#15

@ask
My pool servers link are all in the same DC as a stratum-1 just a single hop away from the stratum-1. They’re baremetal racked servers in very well-connected datacenters on both U.S. coasts, Europe, and Asia. I’d be happy to run your monitoring daemon on them.

3 Likes
#16

Thanks for providing solid servers in Taiwan! But the ipv4 address of t1.time.tw1.yahoo.com and t2.time.tw1.yahoo.com were misconfigured by the pool system as from Singapore. Hope this can be fixed.

1 Like
#17

My system is virtualized (hosted by Aliyun AS37963) and 50ms away from a stratum 1 source. But I think it is keeping good time, chronyc tracking shows the following.

chronyc tracking
Reference ID : 202.118.1.46 (ntp.neu.edu.cn)
Stratum : 2
Ref time (UTC) : Sun Mar 3 22:03:29 2019
System time : 0.000008752 seconds slow of NTP time
Last offset : -0.000003167 seconds
RMS offset : 0.000022342 seconds
Frequency : 0.001 ppm slow
Residual freq : -0.000 ppm
Skew : 0.004 ppm
Root delay : 0.050510 seconds
Root dispersion : 0.000484 seconds
Update interval : 1027.5 seconds
Leap status : Normal

And this system is within China also, so maybe it can help monitoring CN servers at least better than nothing. You might want to give it a try until a owner of real servers in China shown up.

#18

Im 0.6 ms away from a two statum 1 servers. Its connected with 8x1 Gbit/s links.
I can run a monitor VM in Sweden if that would help.

1 Like
#19

We are interested in running this at the Finnish UTC-lab, VTT MIKES. Let us know how to proceed :slight_smile:

2 Likes
#20

If you consider kvm well enough to keep time, then you can feel free to run it on montreal.ca.logiplex.net or logiplex.net

How much memory does it use? As far as I know, Go is an efficient language. I don’t know much about it but I hung in the channel on freenode some while and seem to have gathered that

The first is at Ovh while the latter is at vultr/choopa which is primarily NTT

I could also donate one of my two VPS in Hong Kong that are paid for the next year and dedicate it to the purpose, you can actually have root if you’d like because all they do is ntp and I don’t use them for anything else., but they are openvz they do keep really good time, an ntp test from https://servertest.online/ntp showed in the millionths of a second for one of them recently

They are all configured with the nearest public stratum 1

Noah

#21

I really hope this gets done. All my servers are dropping out as soon as they score up. There is nothing wrong with them that they can’t handle though

Noah

#22

All my servers are dropping out as soon as they score up. There is nothing wrong with them that they can’t handle though.

If they are only dropping out when they get added to the rotation (score >= 10), then the issue is unlikely to be monitoring, and there is probably actually something wrong on your end. If the problem was a monitoring issue, it should also still be dropping when the score is below 10.

If the issue is not with the server, then I’d recommend you take a look at the network - in particular, that connection tracking (and by implication NAT) is disabled for NTP packets, in both directions. The server is not the only place where NTP traffic can cause load problems. Depending on your network topology, this may need to be configured on more than one router.

Something else to consider - when initially setting up an NTP server on my current ISP, it tripped their DDOS-mitigation rules, and much of the traffic was dropped before it even reached me. This was resolved by simply calling them and having my server’s IP excluded from that monitoring. If you are certain that you have ruled out both your server and your network as potential causes, then talking to your upstream(s) may be a good idea.

1 Like
#23

It is. Quoting from my server’s monitor log:

1553776099,"2019-03-28 12:28:19",0.035440914,1,-1.9,1,"Los Angeles",0,
1553776099,"2019-03-28 12:28:19",0.035440914,1,-1.9,,,0,
1553775143,"2019-03-28 12:12:23",0,-5,-3.1,1,"Los Angeles",,"i/o timeout"
1553775143,"2019-03-28 12:12:23",0,-5,-3.1,,,,"i/o timeout"
1553774179,"2019-03-28 11:56:19",0.033143745,1,2,1,"Los Angeles",0,
1553774179,"2019-03-28 11:56:19",0.033143745,1,2,,,0,
1553773207,"2019-03-28 11:40:07",0.036694936,1,1.1,1,"Los Angeles",0,
1553773207,"2019-03-28 11:40:07",0.036694936,1,1.1,,,0,
1553772238,"2019-03-28 11:23:58",0.037754256,1,0.1,1,"Los Angeles",0,
1553772238,"2019-03-28 11:23:58",0.037754256,1,0.1,,,0,
1553771333,"2019-03-28 11:08:53",0.032810487,1,-1,1,"Los Angeles",0,
1553771333,"2019-03-28 11:08:53",0.032810487,1,-1,,,0,
1553770236,"2019-03-28 10:50:36",0,-5,-2.1,1,"Los Angeles",,"i/o timeout"
1553770236,"2019-03-28 10:50:36",0,-5,-2.1,,,,"i/o timeout"
1553769058,"2019-03-28 10:30:58",0.036298195,1,3.1,1,"Los Angeles",0,
1553769058,"2019-03-28 10:30:58",0.036298195,1,3.1,,,0,
1553767887,"2019-03-28 10:11:27",0,-5,2.2,1,"Los Angeles",,"i/o timeout"
1553767887,"2019-03-28 10:11:27",0,-5,2.2,,,,"i/o timeout"
1553766671,"2019-03-28 09:51:11",0.026999843,1,7.6,1,"Los Angeles",0,
1553766671,"2019-03-28 09:51:11",0.026999843,1,7.6,,,0,
1553765564,"2019-03-28 09:32:44",0.034973041,1,6.9,1,"Los Angeles",0,
1553765564,"2019-03-28 09:32:44",0.034973041,1,6.9,,,0,
1553764393,"2019-03-28 09:13:13",0.033346673,1,6.2,1,"Los Angeles",0,
1553764393,"2019-03-28 09:13:13",0.033346673,1,6.2,,,0,
1553763277,"2019-03-28 08:54:37",0,-5,5.5,1,"Los Angeles",,"i/o timeout"
1553763277,"2019-03-28 08:54:37",0,-5,5.5,,,,"i/o timeout"
1553762148,"2019-03-28 08:35:48",0.032394437,1,11.1,1,"Los Angeles",0,
1553762148,"2019-03-28 08:35:48",0.032394437,1,11.1,,,0,
1553760980,"2019-03-28 08:16:20",0.028899206,1,10.6,1,"Los Angeles",0,
1553760980,"2019-03-28 08:16:20",0.028899206,1,10.6,,,0,
1553759770,"2019-03-28 07:56:10",0.034931327,1,10.1,1,"Los Angeles",0,
#24

Not everybody has the same issue. NoahMcNallie indicated that their server was only having issues once it was added to the rotation (i.e. score >= 10), which indicates that monitoring isn’t the issue in that case.

In your case, the issue appears to be different, as the monitor is still complaining about it regardless of whether or not the server is part of the pool rotation. This may or may not be a symptom of problems with the monitoring; all it tells us is that the monitor is frequently attempting to probe your server and receiving no reply. There are many possible reasons for that to occur. I note that you have been active in the China thread - be aware that there are issues with monitoring servers behind the Great Firewall, so if your server is inside China that may well be what is happening here. If your server is (or was) in the China zone, regardless of where it is geographically located, the load is also a bit of a special case.

#25

Nope, I am in Taiwan, not behind the GFW, but also faced some network problem in the America continent, as previously stated. I don’t think a normal America to Asia packet will transverse America continent for 2 times… LAX to ASH to NYC to LAX before leaving US soil.

#26

There are multiple things going on I think. Both the VPS one in China and one in Canada are getting quite a few thousand requests a second for ntp even while the China VPS has been set to 384KiBit and both are falling out. They are serving a purpose though and it should clear up if think. My American VPS ipv4 score is very low too and looks to have fallen out recently. This one usually stays above. Also the ovh VPS in Canada has been under ddos since I purchased it, on and off. Pretty sure they sold it to me like that :expressionless:

Noah

#27

Not to keep this thread off topic any more than it should be… but montreal.ca.logiplex.net made it into the 11+ score (11.4 right now) in CN at 100MiBit for the first time that I have noticed. I think this might be a sign of good things to come. These VPS are configured the same as my other RHEL VPS servers (well, fedora and two centos) which is basicly `yum update’, iptables and ip6tables rules allowing port 123, a few sysctl parameters, and base ntpd with modified stratum 1 servers that are local to them. There really is not anything more than this done to them. I don’t use them for anything else and the other one was doing fine before, without being modified. That one I am kind of confused about but I guess montreal is a mixture of the ddos it has been experiencing and the monitor station both. I’m not confirming without a doubt that there is nothing … possibly … wrong with them. But, I don’t think so. They query fine from around the world. If anything there would be some sort of rate limiting happening in china and/or at the ISP.

Noah

#28

One last thing is that I do have a Sun T2000 which is one of the last machies Sun made before selling to Oracle. It is 64 Gigs of 16 channel DDR2 and an eight core UltraSPARC T1 and an unused 15K RPM SAS drive at 72GiByte. When I can get it up and running would be under discrepancy and it would not be anywhere that would have DDoS protection. I do soon plan on moving somewhere that I could probably have it running on something like a 10/100 D3 line.

Noah

#29

I have the same thing from Sweden, not getting over 0 anymore…
My own external monitoring shows about 0.03% failure, from LA about 30% failure…

1554224910,“2019-04-02 17:08:30”,-0.002919288,1,-25.5,1,“Los Angeles”,0,
1554224910,“2019-04-02 17:08:30”,-0.002919288,1,-25.5,0,
1554223802,“2019-04-02 16:50:02”,0.001310674,1,-27.9,1,“Los Angeles”,0,
1554223802,“2019-04-02 16:50:02”,0.001310674,1,-27.9,0,
1554222744,“2019-04-02 16:32:24”,-0.005134751,1,-30.4,1,“Los Angeles”,0,
1554222744,“2019-04-02 16:32:24”,-0.005134751,1,-30.4,0,
1554221679,“2019-04-02 16:14:39”,0,-5,-33.1,1,“Los Angeles”,“i/o timeout”
1554221679,“2019-04-02 16:14:39”,0,-5,-33.1,“i/o timeout”
1554220590,“2019-04-02 15:56:30”,0,-5,-29.6,1,“Los Angeles”,“i/o timeout”
1554220590,“2019-04-02 15:56:30”,0,-5,-29.6,“i/o timeout”
1554219516,“2019-04-02 15:38:36”,0,-5,-25.9,1,“Los Angeles”,“i/o timeout”
1554219516,“2019-04-02 15:38:36”,0,-5,-25.9,“i/o timeout”
1554218350,“2019-04-02 15:19:10”,0,-5,-21.9,1,“Los Angeles”,“i/o timeout”
1554218350,“2019-04-02 15:19:10”,0,-5,-21.9,“i/o timeout”
1554217239,“2019-04-02 15:00:39”,-0.002086741,1,-17.8,1,“Los Angeles”,0,
1554217239,“2019-04-02 15:00:39”,-0.002086741,1,-17.8,0,
1554216104,“2019-04-02 14:41:44”,0,-5,-19.8,1,“Los Angeles”,“i/o timeout”
1554216104,“2019-04-02 14:41:44”,0,-5,-19.8,“i/o timeout”
1554214869,“2019-04-02 14:21:09”,-0.003405503,1,-15.6,1,“Los Angeles”,0,
1554214869,“2019-04-02 14:21:09”,-0.003405503,1,-15.6,0,
1554213772,“2019-04-02 14:02:52”,0,-5,-17.5,1,“Los Angeles”,“i/o timeout”
1554213772,“2019-04-02 14:02:52”,0,-5,-17.5,“i/o timeout”

#30

I just tested logiplex.net on kvm with a tsc clocksource, and it differs 0.001148 seconds from nist. How accurate are you looking for, Ask?

I could pay for ddos protection but they only offer 10Gig last I knew. The VPS is throttled to 2.5 for scheduling purposes. It is Gentoo with a gentoo 5.0.7 kernel. It is paid three years. Full cgroups and selinux.

#31

Besides just allowing ntp traffic in iptables, you also need to be sure to disable connection tracking.