Suggestions for monitors, as Newark fails a lot and the scores are dropped too quickly

I’m noticing a slightly different pattern. I’m based in the UK. I believe it is the hops between the UK and US causing the problem.

I have two network providers. I have two ntp servers. I run both IPv4 and IPv6. I present both servers on both providers.

The first provider has a great reputation and provides native IPv6. On IPv4 that provider has struggled to stay above a score of 10 over the past few days. That same provider on IPv6 has consistently scored 20 for days.

The second provider, more of a residential broadband offering, has consistently scored 20 for days.

The paths taken from the monitor to my hosts are different for the two providers.

In my case, I suspect it is a Tier 1 between the UK and US dropping UDP as some kind of <> mitigation. I see less packet loss on protocol 13 (daytime) UDP and protocol 37 (Time) UDP. And no loss for those old protocols with TCP.

So there appears to be a difference between IPv4 and IPv6 (not commenting on 6in4) when comparing network providers. I’m using AS org as a proxy for network provider.

My first provider has a total of 25 hosts in the UK pool (IPv4 and IPv6). For that provider, across both IPv4 and IPv6, the average score is 9.72. For that same provider, the average IPv4 score is 2.74. And for IPv6, the average is 17.58.

AS Org combined IPv4 and IPv6:
AS-combined

AS Org IPv4:
AS-IPv4

AS Org IPv6:
AS-IPv6

It almost sounds like we need two monitoring system one to find downed servers and another to find alligator moats. Then I suppose the pool could promote moatless up servers, down unmoated would just get demoted. Moated servers might get ignored until it drains. I could be wrong about all of that.

“I believe it is the hops between the UK and US causing the problem.”

That is the likely cause: See
https://weberblog.net/ntp-filtering-delay-blockage-in-the-internet/

2 Likes

It seems not all providers apply NTP filtering, since the Los Angeles monitor probes can reach my servers with no problem, and so their answers. Same with the Amsterdam monitor. Scores in the beta system in those are a boring 20 all the time.

But the answer packets to Newark are frequently discarded. Please note I can
see all probe packets reaching my servers, it’s apparently only their answer
packets that are discarded. Oh, and only in IPv4, since the scores for the
IPv6 addresses (same 2 servers) are an unwavering 20 in Newark.

In the last months, both IPV4 server addresses are mostly out of the pool.
The beta site suggests that would not be the case if it was the production
monitor. :wink:

2 Likes

@marco.davids , this does seem to be a common setup and common issue, for residential servers anyhow. I have a stratum 1 on a residential cable internet provider and have been trying to figure this out recently with just this same type of behavior.

I’m surprised this circumstance has no clear step-by-step to achieving better performance from residential servers. the limitation seems to be conflict between need for NAT on residential NTP server and need to turn off connection tracking to get better performance.

constraints are:
(a) gotta keep internet working b/c work depends on it, so no wacky experimental device that bonks sometimes
(b) only 1 public IP address from a residential ISP

1 Like

Something happened two days ago, network conditions became a lot better. My servers are scoring now 20, I do not remember when was that last time:

The .de zone is again flapping by 20% because of the monitoring systems connectivity. pool.ntp.org: NTP Servers in Germany, de.pool.ntp.org

After 3 weeks of score 20 something suddenly changed in the network condition :frowning_face: :

Sorry, it looks like it was a temporary routing problem. Now, new route is established, that seems to have even less routing asymmetry, about 1-2 msec better:

Hi all,

I decided to add my servers to the pool again, so far it looks like all monitor problems are solved.

https://www.ntppool.org/a/bas

Somehow the monitor is on par with my own measurements.

Looks promising :grin:

1 Like

You lucky one…
Two of my server “got dumped” - only IPv4. The same server with IPv6 is stable.
pool.ntp.org_Statistiken_für_37.120.178.73_-_2021-07-12_19.35.58

Nearly -270 IPv4 server in europe from yesterday. 110 from the 270 just from germany :frowning:

Same picture here (Switzerland) - IPv4 only as well, IPv6 is happy and stable. Does anyone have the IP address of the San Jose monitoring station?

Same issue in Switzerland and France:

  • One system in France in AS203476 has score 20 over IPv6, score 10 over IPv4
  • One system in Switzerland in AS13030 has score 20 over IPv6, score -42 on IPv4
  • Two systems in Switzerland in AS3303 have score -43 and -53 over IPv4 (but I’m sure they are working correctly)

It would be great to have the IP and corresponding AS number of the monitoring station. But even more important, I guess, is to have emergency contact details of the operator of the monitoring station. I doubt that it is useful to report this in here.

Mine are dumping as before:

1626141811,“2021-07-13 02:03:31”,-0.001366954,1,8.6,0,
1626140586,“2021-07-13 01:43:06”,0,-5,8,9,“San Jose, CA, US”,“i/o timeout”
1626140586,“2021-07-13 01:43:06”,0,-5,8,“i/o timeout”
1626139356,“2021-07-13 01:22:36”,0,-5,13.7,9,“San Jose, CA, US”,“i/o timeout”
1626139356,“2021-07-13 01:22:36”,0,-5,13.7,“i/o timeout”

This happens randomly and makes the scores go bad.
Shame it’s still not fixed.

Oh well, time to delete my servers again as this makes no sense.
The problem is still the monitor and not our servers.

FWIW, there are several pool members in .cz zone which also experience score drop over IPv4 since the last weekend. My server is 147.251.48.140/2001:718:801:230::8c connected to AS2852. The IPv6 score is fine.

At this time, I can’t even get to the management site (same with the beta
site):

503 Service Unavailable

No server is available to handle this request.

This happens for years and years and years, nothing new to it and Ask doesn’t fix it.
I’ve set mine to support IPv6 also in the past and IPv6 was fine, where IPv4 continued to fail.

The problem is the system Ask build, it seems he doesn’t care about IPv4 and as such we all drop, all the time.

I gave up, again, on the NTP-pool as it’s not worth the time because Ask will not fix his IPv4 monitor.

You can ask all you want, nothing happens. And no it’s not your server, it’s not.

They send you out running in circles and IPv4 continues to fail due to a bad monitor for IPv4.

Good luck getting it fixed, as they wont. :face_with_symbols_over_mouth:

I added my servers again, but this time changed the DNS entries.
Normally TTL is 3600 seconds, I changed that to 86400 seconds.
Maybe the monitor uses various DNS-servers that may not cache enough.
By changing DNS-cache from 1 hour to 24 hours, it may not be able to have a timeout on DNS.

As most DNS have default 3600 TTL, DNS-servers(cache) needs to reload a lot and a new request may take a lot of time when it’s connected to a very busy DNS-server.

Just an idea of mine, let’s see what the drops are now.

Bas: When adding servers, the pool server management website looks up the IP address(es) for the given hostname and adds the IP addresses to the pool. After that, no DNS lookups are performed, and whatever you set as the TTL in your DNS setup does not matter.

1 Like

You could be right.
Time will tell, as so far the monitor works fine.
It shows downtime, my own doing, but no problems so far.

Maybe some routes have been fixed or something else changed.
We will see.

Thanks for the remark.