Monitoring stations timeout to our NTP servers

Same here,

Server with IPv4 & IPv6.
IPv6 score drops below 14 and IPv5 score is stabe at 20.

@ask, where are you with it now?
Is the code accessible that one can submit possible fixes?

I tried to add my servers to this monitoring development, but it does not seem to work.

I am also having the same issue.
On the product ntppool:
https://www.pool.ntp.org/scores/45.76.111.149
https://www.pool.ntp.org/scores/2401:c080:1000:4175:5400:2ff:fe32:f445

Server is based in Japan.
IPv6 is solid ; IPv4 keeps oscillating
I was almost pulling my hair out - since all my tests from multiple servers and locations (US, IN, Japan) were 100% positive.

Now I loaded these to the beta pool.
https://web.beta.grundclock.com/scores/45.76.111.149
https://web.beta.grundclock.com/scores/2401:c080:1000:4175:5400:2ff:fe32:f445

These were just put today - so they haven’t yet reached the perfect 20. However the trend is very clearly visible - especially for the IPv4 interface. Newark keeps reporting a timeout - not sure if it is behaving any better than the production pool ?

Any ideas how to improve the situation - due to this issue, I guess the pool effective capacity is much lesser than what it should be? If the transport related issues are not easily solvable; is it an option to tweak the score downgrade that the Newark monitor gives - so that it is not downgraded so aggressively? What is also puzzling is why the transport is OK for some polls, and not the others ?

i have the same problem here in italy but only from isp fastweb and tiscali
ipv4 score floating 10/14
ipv6 score 20

from isp wind i have
ipv4 score 20
ipv6 score 20

1565817537,"2019-08-14 21:18:57",0.00379887,1,7.3,,,0,
1565816626,"2019-08-14 21:03:46",0,-5,6.6,6,"Newark, NJ, US",,"i/o timeout"
1565816626,"2019-08-14 21:03:46",0,-5,6.6,,,,"i/o timeout"
1565815642,"2019-08-14 20:47:22",0.001398844,1,12.3,6,"Newark, NJ, US",0,
1565815642,"2019-08-14 20:47:22",0.001398844,1,12.3,,,0,
1565814669,"2019-08-14 20:31:09",0.001399639,1,11.9,6,"Newark, NJ, US",0,
1565814669,"2019-08-14 20:31:09",0.001399639,1,11.9,,,0,
1565813742,"2019-08-14 20:15:42",0.002030171,1,11.4,6,"Newark, NJ, US",0,
1565813742,"2019-08-14 20:15:42",0.002030171,1,11.4,,,0,
1565812824,"2019-08-14 20:00:24",0.002168236,1,11,6,"Newark, NJ, US",0,
1565812824,"2019-08-14 20:00:24",0.002168236,1,11,,,0,
1565811916,"2019-08-14 19:45:16",0.002224739,1,10.5,6,"Newark, NJ, US",0,
1565811916,"2019-08-14 19:45:16",0.002224739,1,10.5,,,0,
1565810982,"2019-08-14 19:29:42",0.001927572,1,10,6,"Newark, NJ, US",0,
1565810982,"2019-08-14 19:29:42",0.001927572,1,10,,,0,
1565809981,"2019-08-14 19:13:01",0.002043029,1,9.5,6,"Newark, NJ, US",0,
1565809981,"2019-08-14 19:13:01",0.002043029,1,9.5,,,0,
1565808980,"2019-08-14 18:56:20",0.001458194,1,8.9,6,"Newark, NJ, US",0,
1565808980,"2019-08-14 18:56:20",0.001458194,1,8.9,,,0,
1565808019,"2019-08-14 18:40:19",0.001889322,1,8.3,6,"Newark, NJ, US",0,
1565808019,"2019-08-14 18:40:19",0.001889322,1,8.3,,,0,
1565807110,"2019-08-14 18:25:10",0,-5,7.7,6,"Newark, NJ, US",,"i/o timeout"
1565807110,"2019-08-14 18:25:10",0,-5,7.7,,,,"i/o timeout"

The beta site also has a monitor in Amsterdam; I’m curious if (anecdotally) the Amsterdam one is getting different/better results. (Also, the beta site has a bazillion other changes around adding and managing the servers that could use some testing!)

https://manage-beta.grundclock.com/manage/servers

When I click on that link, it gives me a:

500 - Server Error

Ouch! That didn’t work, our server hit a bad gear.

I am logged into the beta site…

Me too. I’m sure it was working yesterday :upside_down_face:

Hi,
I’m happy to report that the IPv4 score of the server is now much much better and as solid as the IPv6 side.
Last reported timeout from the Newark monitoring server was at the below time, and the last successful monitoring request was 2019-08-18 07:05:41; so ~5 days and score is now a solid 20.

1565687391,“2019-08-13 09:09:51”,0,-5,-2.8,6,“Newark, NJ, US”,“i/o timeout”

I have not changed anything impacting this on my end - so something else has changed for the good and hopefully it stays that way :slight_smile: !

Thanks!

Hi,

since yesterday 21:00 MESZ I get on all of my servers

1566222048,“2019-08-19 13:40:48”,0,-5,-8.7,6,“Newark, NJ, US”,“i/o timeout”

CU
Jörg

Hi Jörg, tracerouting to and from the monitoring servers may help you work out where the routing’s broken. This page https://dev.ntppool.org/monitoring/network-debugging/ gives the monitoring server IPs and a tool to traceroute back to your servers (replace the 8.8.8.8 with your IP).

Hi,

here are a traceroute:

curl http://trace.ntppool.org/traceroute/85.93.91.145
Traceroute to 85.93.91.145
1 (139.178.64.41) AS54825 20.300 20.262
2 (147.75.98.106) AS54825 11.808
2 0.xe-0-0-17.dsr1.ewr1.packet.net (147.75.98.104) AS54825 8.295
3 0.ae12.bbr1.ewr1.packet.net (198.16.4.88) AS54825 1.812 1.794
4 39.ae39.bbr1.jfk3.packet.net (192.80.8.99) AS54825 1.273 1.265
5 (64.125.54.25) AS6461 1.961
5 39.ae39.bbr1.jfk3.packet.net (192.80.8.99) AS6461 1.218
6 (64.125.54.25) AS6461 2.165 *
7 * *
8 ae27.mpr2.lhr2.uk.zip.zayo.com (64.125.30.237) AS6461 67.003 *
9 ae27.mpr2.lhr2.uk.zip.zayo.com (64.125.30.237) AS6461 67.160 67.414
10 (195.66.225.173) 71.336
10 ae11.mpr1.lhr15.uk.zip.zayo.com (64.125.30.53) AS6461 66.790
11 (195.66.225.173) 71.373
11 ae0.cr-merak.lon5.bb.godaddy.com (87.230.113.1) AS20773 72.024
12 ae0.cr-merak.lon5.bb.godaddy.com (87.230.113.1) AS20773 72.182 72.135
13 (87.230.112.3) AS20773 81.000
13 ae0.cr-nunki.sxb1.bb.godaddy.com (87.230.113.3) AS20773 81.189
14 (87.230.112.3) AS20773 81.965
14 (62.138.129.10) AS20773 84.873
15 (62.138.129.10) AS20773 85.118 83.234
16 snct3.snct-dialer.de (85.93.91.145) AS8972 80.914 80.874

With ping I get a 404:

curl http://trace.ntppool.org/ping/85.93.91.145
404 page not found

Hi,

I’m trying debug problems with monitoring of my server. When I try trace.ntppool.org/traceroute using curl it seems not working correctly:

curl https://trace.ntppool.org/traceroute/62.197.224.14

Traceroute to 62.197.224.14
 1 (139.178.64.41) AS54825  19.720  19.683
 2 0.xe-0-0-17.dsr2.ewr1.packet.net (147.75.98.106) AS54825  17.796  17.776
 3 (198.16.4.86) AS54825  0.540
 3 0.ae11.bbr1.ewr1.packet.net (198.16.4.84) AS54825  0.444
iwik@[avenger]:~$ curl https://trace.ntppool.org/traceroute/62.197.224.14
Traceroute to 62.197.224.14
 1 (139.178.64.41) AS54825  11.906  11.869
 2 (147.75.98.104) AS54825  15.445
 2 0.xe-0-0-17.dsr2.ewr1.packet.net (147.75.98.106) AS54825  12.283
 3 (198.16.4.88) AS54825  0.516
 3 0.ae22.bbr2.ewr1.packet.net (198.16.4.90) AS54825  0.464
iwik@[avenger]:~$ curl https://trace.ntppool.org/traceroute/62.197.224.14
Traceroute to 62.197.224.14
 1 (139.178.64.41) AS54825  14.637  14.591
 2 (147.75.98.106) AS54825  29.749
 2 0.xe-0-0-17.dsr1.ewr1.packet.net (147.75.98.104) AS54825  13.531
iwik@[avenger]:~$ curl https://trace.ntppool.org/traceroute/62.197.224.14
Traceroute to 62.197.224.14
 1 (139.178.64.41) AS54825  21.024  20.988
 2 (147.75.98.104) AS54825  20.128
 2 0.xe-0-0-17.dsr2.ewr1.packet.net (147.75.98.106) AS54825  20.041
 3 0.ae12.bbr1.ewr1.packet.net (198.16.4.88) AS54825  0.421  0.413
iwik@[avenger]:~$ curl https://trace.ntppool.org/traceroute/62.197.224.14
Traceroute to 62.197.224.14
 1 (139.178.64.41) AS54825  11.749  11.705
 2 0.xe-0-0-17.dsr1.ewr1.packet.net (147.75.98.104) AS54825  12.831  12.809

From web browser output is better

Traceroute to 62.197.224.14
 1 (139.178.64.41) AS54825  42.988  42.949
 2 0.xe-0-0-17.dsr1.ewr1.packet.net (147.75.98.104) AS54825  21.007  20.992
 3 0.ae12.bbr1.ewr1.packet.net (198.16.4.88) AS54825  0.453  0.446
 4 (198.16.4.81) AS54825  0.269
 4 39.ae39.bbr1.jfk3.packet.net (192.80.8.99) AS54825  1.090
 5 (64.125.54.25) AS6461  1.187
 5 39.ae39.bbr1.jfk3.packet.net (192.80.8.99) AS6461  1.280
 6 64.125.54.25.available.above.net (64.125.54.25) AS6461  *  1.121
 7  *  *
 8 (64.125.29.127) AS6461  79.624
 8 ae0.cs1.lhr15.uk.eth.zayo.com (64.125.29.119) AS6461  79.210
 9 (64.125.29.16) AS6461  79.328
 9 ae0.cs1.lhr15.uk.eth.zayo.com (64.125.29.119) AS6461  79.201
10 (64.125.29.16) AS6461  79.188
10 ae0.cs1.ams17.nl.eth.zayo.com (64.125.29.81) AS6461  80.876
11 (64.125.29.58) AS6461  79.368
11 ae0.cs1.ams17.nl.eth.zayo.com (64.125.29.81) AS6461  80.699
12 ae0.cs1.fra9.de.eth.zayo.com (64.125.29.55) AS6461  79.216  79.319
13 (64.125.29.55) AS6461  79.301
13 ae27.mpr1.fra4.de.zip.zayo.com (64.125.30.255) AS6461  81.376
14 gwup.dc.ba.gts.sk (80.81.193.195)  96.288  97.063
15 (62.168.99.66) AS5578  98.704
15 gwup.dc.ba.gts.sk (80.81.193.195) AS5578  97.166
16 (195.168.61.115) AS5578  96.726
16 se-0-1-0-0.gwa.husarik.ca.gts.sk (62.168.99.66) AS5578  98.820
17 (62.197.224.14) AS16160  100.444
17 b3.ibm.ke.cust.gts.sk (195.168.61.115) AS16160  96.700
18  *  *
19  *  *

Next try:

Traceroute to 62.197.224.14
 1 (139.178.64.41) AS54825  10.950  10.902
 2 0.xe-0-0-17.dsr1.ewr1.packet.net (147.75.98.104) AS54825  12.076  12.060
 3 0.ae22.bbr2.ewr1.packet.net (198.16.4.90) AS54825  1.154  0.886
 4 (198.16.4.81) AS54825  0.486
 4 39.ae39.bbr1.jfk3.packet.net (192.80.8.99) AS54825  1.121
 5 (192.80.8.99) AS54825  1.107
 5 64.125.54.25.available.above.net (64.125.54.25) AS54825  1.149
 6 64.125.54.25.available.above.net (64.125.54.25) AS6461  1.129  1.341
 7  *  *
 8 ae0.cs1.lhr15.uk.eth.zayo.com (64.125.29.119) AS6461  79.803  *
 9 ae0.cs1.lhr15.uk.eth.zayo.com (64.125.29.119) AS6461  77.266  77.432
10 (64.125.29.16) AS6461  77.468
10 ae0.cs1.ams17.nl.eth.zayo.com (64.125.29.81) AS6461  77.393
11 ae0.cs1.ams17.nl.eth.zayo.com (64.125.29.81) AS6461  77.359  77.547
12 (64.125.29.55) AS6461  101.687
12 ae2.cs1.fra6.de.eth.zayo.com (64.125.29.58) AS6461  77.459
13 (64.125.30.255) AS6461  77.253
13 ae0.cs1.fra9.de.eth.zayo.com (64.125.29.55) AS6461  92.128
14 gwup.dc.ba.gts.sk (80.81.193.195)  90.545  90.271
15 (62.168.99.66) AS5578  90.374
15 gwup.dc.ba.gts.sk (80.81.193.195) AS5578  90.226
16 (62.168.99.66) AS5578  90.242
16 b3.ibm.ke.cust.gts.sk (195.168.61.115) AS5578  89.786
17 (195.168.61.115) AS5578  89.767
17 cloud.zazezi.net (62.197.224.14) AS5578  94.957
18 cloud.zazezi.net (62.197.224.14) AS16160  93.752  93.786

According to this, routing at first attempt is some strange at the end?

Hi @jff, your server is currently showing a steady score over 10 and seems to be missing only a few monitoring pings so it’s in the pool at the moment. The problem with internet routing is that it changes so unless you happen to catch it while it’s not responding it’s impossible to work out where the issue is… :frowning:

@iwik, I’m seeing similar drops at packet.net - maybe it’s worth an email to them (or your ISP) to see if they can investigate.

I’m seeing the same, much less reliable monitoring from NJ. (Still reachable fine in my own testing from various locations around the internet.)

The same problem here. I have an NTP server with IPv6 interface. Since the last two days ( https://www.ntppool.org/scores/2a00:6d40:60:80a0::1 ) my server started getting reachability errors from Newark, NJ monitoring station. I think there is a problem with that monitoring station. The NTP Status page ( https://status.ntppool.org/ ) shows a dramatic drop in IPv6 servers. The problem may be only on the IPv6 network side of the monitoring station.

Yes, I should have specified IPv6. https://www.ntppool.org/zone/north-america says 40% of NA IPv6 servers disappeared since yesterday…

Hi, one of the volunteer admins here. We’re seeing issues with IPv6 monitoring at the moment - I have flagged them to @ask to take a look when he’s next online.

1 Like

My servers haven’t had monitoring problems over the last few months – as far as I know – but I do now.

https://www.ntppool.org/scores/2600:1f16:ec6:ec6c:72a0:38cf:14ef:8bc0
https://web.beta.grundclock.com/scores/2600:1f16:ec6:ec6c:72a0:38cf:14ef:8bc0
https://web.beta.grundclock.com/scores/13.58.6.55 (for comparison)

I don’t think there are problems on my end, but I can’t say for certain.

I don’t think there was – it affected ~25% of all the IPv6 servers. :frowning:

One interesting thing to learn from this is that 2 monitoring systems isn’t enough to be helpful for this sort of thing, with the current scoring algorithm.

Maybe the system could have turned off the misbehaving monitor, but in the past we’ve seen major IPv6 problems before that really were real.

One example I remember was one of the big European IX’es null-routing all IPv6 traffic (or maybe it was just NTP traffic?) for a day. That could have appeared similarly to the system, and I don’t know if the right response would be to not believe the monitor in that case.

So … we need more than two watches to tell the time, which I guess isn’t a surprise to this community! :smile:

1 Like

Hi @ask,

Do you need a VM in Switzerland for monitoring ? Could it help the community ?

Thanks,
Cheers