Additional monitoring servers (help wanted)

NoahMcNallie · March 27, 2019, 6:50am

I really hope this gets done. All my servers are dropping out as soon as they score up. There is nothing wrong with them that they can’t handle though

Noah

erayd · March 28, 2019, 12:58am

All my servers are dropping out as soon as they score up. There is nothing wrong with them that they can’t handle though.

If they are only dropping out when they get added to the rotation (score >= 10), then the issue is unlikely to be monitoring, and there is probably actually something wrong on your end. If the problem was a monitoring issue, it should also still be dropping when the score is below 10.

If the issue is not with the server, then I’d recommend you take a look at the network - in particular, that connection tracking (and by implication NAT) is disabled for NTP packets, in both directions. The server is not the only place where NTP traffic can cause load problems. Depending on your network topology, this may need to be configured on more than one router.

Something else to consider - when initially setting up an NTP server on my current ISP, it tripped their DDOS-mitigation rules, and much of the traffic was dropped before it even reached me. This was resolved by simply calling them and having my server’s IP excluded from that monitoring. If you are certain that you have ruled out both your server and your network as potential causes, then talking to your upstream(s) may be a good idea.

alica · March 28, 2019, 1:21pm

It is. Quoting from my server’s monitor log:

1553776099,"2019-03-28 12:28:19",0.035440914,1,-1.9,1,"Los Angeles",0,
1553776099,"2019-03-28 12:28:19",0.035440914,1,-1.9,,,0,
1553775143,"2019-03-28 12:12:23",0,-5,-3.1,1,"Los Angeles",,"i/o timeout"
1553775143,"2019-03-28 12:12:23",0,-5,-3.1,,,,"i/o timeout"
1553774179,"2019-03-28 11:56:19",0.033143745,1,2,1,"Los Angeles",0,
1553774179,"2019-03-28 11:56:19",0.033143745,1,2,,,0,
1553773207,"2019-03-28 11:40:07",0.036694936,1,1.1,1,"Los Angeles",0,
1553773207,"2019-03-28 11:40:07",0.036694936,1,1.1,,,0,
1553772238,"2019-03-28 11:23:58",0.037754256,1,0.1,1,"Los Angeles",0,
1553772238,"2019-03-28 11:23:58",0.037754256,1,0.1,,,0,
1553771333,"2019-03-28 11:08:53",0.032810487,1,-1,1,"Los Angeles",0,
1553771333,"2019-03-28 11:08:53",0.032810487,1,-1,,,0,
1553770236,"2019-03-28 10:50:36",0,-5,-2.1,1,"Los Angeles",,"i/o timeout"
1553770236,"2019-03-28 10:50:36",0,-5,-2.1,,,,"i/o timeout"
1553769058,"2019-03-28 10:30:58",0.036298195,1,3.1,1,"Los Angeles",0,
1553769058,"2019-03-28 10:30:58",0.036298195,1,3.1,,,0,
1553767887,"2019-03-28 10:11:27",0,-5,2.2,1,"Los Angeles",,"i/o timeout"
1553767887,"2019-03-28 10:11:27",0,-5,2.2,,,,"i/o timeout"
1553766671,"2019-03-28 09:51:11",0.026999843,1,7.6,1,"Los Angeles",0,
1553766671,"2019-03-28 09:51:11",0.026999843,1,7.6,,,0,
1553765564,"2019-03-28 09:32:44",0.034973041,1,6.9,1,"Los Angeles",0,
1553765564,"2019-03-28 09:32:44",0.034973041,1,6.9,,,0,
1553764393,"2019-03-28 09:13:13",0.033346673,1,6.2,1,"Los Angeles",0,
1553764393,"2019-03-28 09:13:13",0.033346673,1,6.2,,,0,
1553763277,"2019-03-28 08:54:37",0,-5,5.5,1,"Los Angeles",,"i/o timeout"
1553763277,"2019-03-28 08:54:37",0,-5,5.5,,,,"i/o timeout"
1553762148,"2019-03-28 08:35:48",0.032394437,1,11.1,1,"Los Angeles",0,
1553762148,"2019-03-28 08:35:48",0.032394437,1,11.1,,,0,
1553760980,"2019-03-28 08:16:20",0.028899206,1,10.6,1,"Los Angeles",0,
1553760980,"2019-03-28 08:16:20",0.028899206,1,10.6,,,0,
1553759770,"2019-03-28 07:56:10",0.034931327,1,10.1,1,"Los Angeles",0,

erayd · March 28, 2019, 3:30pm

Not everybody has the same issue. NoahMcNallie indicated that their server was only having issues once it was added to the rotation (i.e. score >= 10), which indicates that monitoring isn’t the issue in that case.

In your case, the issue appears to be different, as the monitor is still complaining about it regardless of whether or not the server is part of the pool rotation. This may or may not be a symptom of problems with the monitoring; all it tells us is that the monitor is frequently attempting to probe your server and receiving no reply. There are many possible reasons for that to occur. I note that you have been active in the China thread - be aware that there are issues with monitoring servers behind the Great Firewall, so if your server is inside China that may well be what is happening here. If your server is (or was) in the China zone, regardless of where it is geographically located, the load is also a bit of a special case.

alica · March 28, 2019, 3:53pm

Nope, I am in Taiwan, not behind the GFW, but also faced some network problem in the America continent, as previously stated. I don’t think a normal America to Asia packet will transverse America continent for 2 times… LAX to ASH to NYC to LAX before leaving US soil.

NoahMcNallie · March 28, 2019, 4:08pm

There are multiple things going on I think. Both the VPS one in China and one in Canada are getting quite a few thousand requests a second for ntp even while the China VPS has been set to 384KiBit and both are falling out. They are serving a purpose though and it should clear up if think. My American VPS ipv4 score is very low too and looks to have fallen out recently. This one usually stays above. Also the ovh VPS in Canada has been under ddos since I purchased it, on and off. Pretty sure they sold it to me like that

Noah

NoahMcNallie · March 29, 2019, 8:52pm

Not to keep this thread off topic any more than it should be… but montreal.ca.logiplex.net made it into the 11+ score (11.4 right now) in CN at 100MiBit for the first time that I have noticed. I think this might be a sign of good things to come. These VPS are configured the same as my other RHEL VPS servers (well, fedora and two centos) which is basicly `yum update’, iptables and ip6tables rules allowing port 123, a few sysctl parameters, and base ntpd with modified stratum 1 servers that are local to them. There really is not anything more than this done to them. I don’t use them for anything else and the other one was doing fine before, without being modified. That one I am kind of confused about but I guess montreal is a mixture of the ddos it has been experiencing and the monitor station both. I’m not confirming without a doubt that there is nothing … possibly … wrong with them. But, I don’t think so. They query fine from around the world. If anything there would be some sort of rate limiting happening in china and/or at the ISP.

Noah

NoahMcNallie · March 29, 2019, 10:50pm

One last thing is that I do have a Sun T2000 which is one of the last machies Sun made before selling to Oracle. It is 64 Gigs of 16 channel DDR2 and an eight core UltraSPARC T1 and an unused 15K RPM SAS drive at 72GiByte. When I can get it up and running would be under discrepancy and it would not be anywhere that would have DDoS protection. I do soon plan on moving somewhere that I could probably have it running on something like a 10/100 D3 line.

Noah

magnusg · April 2, 2019, 7:32pm

I have the same thing from Sweden, not getting over 0 anymore…
My own external monitoring shows about 0.03% failure, from LA about 30% failure…

1554224910,“2019-04-02 17:08:30”,-0.002919288,1,-25.5,1,“Los Angeles”,0,
1554224910,“2019-04-02 17:08:30”,-0.002919288,1,-25.5,0,
1554223802,“2019-04-02 16:50:02”,0.001310674,1,-27.9,1,“Los Angeles”,0,
1554223802,“2019-04-02 16:50:02”,0.001310674,1,-27.9,0,
1554222744,“2019-04-02 16:32:24”,-0.005134751,1,-30.4,1,“Los Angeles”,0,
1554222744,“2019-04-02 16:32:24”,-0.005134751,1,-30.4,0,
1554221679,“2019-04-02 16:14:39”,0,-5,-33.1,1,“Los Angeles”,“i/o timeout”
1554221679,“2019-04-02 16:14:39”,0,-5,-33.1,“i/o timeout”
1554220590,“2019-04-02 15:56:30”,0,-5,-29.6,1,“Los Angeles”,“i/o timeout”
1554220590,“2019-04-02 15:56:30”,0,-5,-29.6,“i/o timeout”
1554219516,“2019-04-02 15:38:36”,0,-5,-25.9,1,“Los Angeles”,“i/o timeout”
1554219516,“2019-04-02 15:38:36”,0,-5,-25.9,“i/o timeout”
1554218350,“2019-04-02 15:19:10”,0,-5,-21.9,1,“Los Angeles”,“i/o timeout”
1554218350,“2019-04-02 15:19:10”,0,-5,-21.9,“i/o timeout”
1554217239,“2019-04-02 15:00:39”,-0.002086741,1,-17.8,1,“Los Angeles”,0,
1554217239,“2019-04-02 15:00:39”,-0.002086741,1,-17.8,0,
1554216104,“2019-04-02 14:41:44”,0,-5,-19.8,1,“Los Angeles”,“i/o timeout”
1554216104,“2019-04-02 14:41:44”,0,-5,-19.8,“i/o timeout”
1554214869,“2019-04-02 14:21:09”,-0.003405503,1,-15.6,1,“Los Angeles”,0,
1554214869,“2019-04-02 14:21:09”,-0.003405503,1,-15.6,0,
1554213772,“2019-04-02 14:02:52”,0,-5,-17.5,1,“Los Angeles”,“i/o timeout”
1554213772,“2019-04-02 14:02:52”,0,-5,-17.5,“i/o timeout”

NoahMcNallie · April 18, 2019, 12:24am

I just tested logiplex.net on kvm with a tsc clocksource, and it differs 0.001148 seconds from nist. How accurate are you looking for, Ask?

I could pay for ddos protection but they only offer 10Gig last I knew. The VPS is throttled to 2.5 for scheduling purposes. It is Gentoo with a gentoo 5.0.7 kernel. It is paid three years. Full cgroups and selinux.

kenyon · May 13, 2019, 1:52pm

Besides just allowing ntp traffic in iptables, you also need to be sure to disable connection tracking.

csweeney05 · June 26, 2019, 7:34pm

Willing to host a site here is as well.

c.barthes4 · July 27, 2019, 4:24pm

@ask : how can i participate to the monitoring servers ?
Chris

russ · August 1, 2019, 2:52pm

I’d be happy to run some monitors, if still needed. I have physical servers in Montreal (OVH BHS) and Nuremberg (Hetzner).

Bas · October 19, 2019, 2:52pm

Hi Ask,
I think I have found the root of the IPv4 problems…namely loadbalancers.
Have a look here:

As the current Newark is passing 2 routers, if you mtr udp packages for some time you see a second router pop-up.

The problem is that UDP that is trying to pass the second router is being dropped.

In short what happens is this, if your NTP-watchdog is sending a package from e.g. 10.0.0.10 and it passes router1 at 10.0.0.1, when it comes to my server the request is answered.
However on the same IP it could try to pass router2 at 10.0.0.2 however this router isn’t expecting an UDP package from me as it never knew your monitor has send one.
So it doesn’t know the destination and drops the UDP package.
This only happens to IPv4 when IP-binding/loadbalancing (multiple network-connections) is done.

IPv6 doesn’t have this problem as all servers have their own unique address as such there is no mistake about the destination.

I’m not quite sure if this is your problem, but it looks very possible because IPv6 users have no problems but IPv4 users do.

Many docker and other VM users complain about UDP-drops at such hosters, it’s not uncommon.

I do know Cinfu.com has very cheap VPS’ses and they do not have this problem since they updated their server-VM-software.
I added 2 off those in my NTP-pool and they tick perfectly without problems.
You could ask them to run a monitor for you.

Or ask packet to route your IP just via 1 router without IP-binding or loadbalancing, it should be fine then if I’m correct.

Bas.

Bas · October 20, 2019, 12:45pm

@Ask, I have contacted Cinfu and they are willing to supply you with a location for your purpose.
You have more details via email.

Davo · November 26, 2019, 7:56am

I have a Stratum 1 source and a server that keeps good time. Based in Australia.

Topic		Replies	Views
Suggestions for monitors, as Newark fails a lot and the scores are dropped too quickly Server operators monitoring	91	4705	August 2, 2021
Monitoring stations timeout to our NTP servers Server operators	103	8898	May 22, 2021
Adding New Monitor Pool Development monitoring	18	1816	April 17, 2024
No PR action for project and dramatic reduction in the number of active servers	60	2993	January 7, 2020
Score/network woes Server operators monitoring	71	7465	March 7, 2019

Additional monitoring servers (help wanted)

Related topics