Monitoring stations timeout to our NTP servers

Me too. I’m sure it was working yesterday :upside_down_face:

Hi,
I’m happy to report that the IPv4 score of the server is now much much better and as solid as the IPv6 side.
Last reported timeout from the Newark monitoring server was at the below time, and the last successful monitoring request was 2019-08-18 07:05:41; so ~5 days and score is now a solid 20.

1565687391,“2019-08-13 09:09:51”,0,-5,-2.8,6,“Newark, NJ, US”,“i/o timeout”

I have not changed anything impacting this on my end - so something else has changed for the good and hopefully it stays that way :slight_smile: !

Thanks!

Hi,

since yesterday 21:00 MESZ I get on all of my servers

1566222048,“2019-08-19 13:40:48”,0,-5,-8.7,6,“Newark, NJ, US”,“i/o timeout”

CU
Jörg

Hi Jörg, tracerouting to and from the monitoring servers may help you work out where the routing’s broken. This page https://dev.ntppool.org/monitoring/network-debugging/ gives the monitoring server IPs and a tool to traceroute back to your servers (replace the 8.8.8.8 with your IP).

Hi,

here are a traceroute:

curl http://trace.ntppool.org/traceroute/85.93.91.145
Traceroute to 85.93.91.145
1 (139.178.64.41) AS54825 20.300 20.262
2 (147.75.98.106) AS54825 11.808
2 0.xe-0-0-17.dsr1.ewr1.packet.net (147.75.98.104) AS54825 8.295
3 0.ae12.bbr1.ewr1.packet.net (198.16.4.88) AS54825 1.812 1.794
4 39.ae39.bbr1.jfk3.packet.net (192.80.8.99) AS54825 1.273 1.265
5 (64.125.54.25) AS6461 1.961
5 39.ae39.bbr1.jfk3.packet.net (192.80.8.99) AS6461 1.218
6 (64.125.54.25) AS6461 2.165 *
7 * *
8 ae27.mpr2.lhr2.uk.zip.zayo.com (64.125.30.237) AS6461 67.003 *
9 ae27.mpr2.lhr2.uk.zip.zayo.com (64.125.30.237) AS6461 67.160 67.414
10 (195.66.225.173) 71.336
10 ae11.mpr1.lhr15.uk.zip.zayo.com (64.125.30.53) AS6461 66.790
11 (195.66.225.173) 71.373
11 ae0.cr-merak.lon5.bb.godaddy.com (87.230.113.1) AS20773 72.024
12 ae0.cr-merak.lon5.bb.godaddy.com (87.230.113.1) AS20773 72.182 72.135
13 (87.230.112.3) AS20773 81.000
13 ae0.cr-nunki.sxb1.bb.godaddy.com (87.230.113.3) AS20773 81.189
14 (87.230.112.3) AS20773 81.965
14 (62.138.129.10) AS20773 84.873
15 (62.138.129.10) AS20773 85.118 83.234
16 snct3.snct-dialer.de (85.93.91.145) AS8972 80.914 80.874

With ping I get a 404:

curl http://trace.ntppool.org/ping/85.93.91.145
404 page not found

Hi,

I’m trying debug problems with monitoring of my server. When I try trace.ntppool.org/traceroute using curl it seems not working correctly:

curl https://trace.ntppool.org/traceroute/62.197.224.14

Traceroute to 62.197.224.14
 1 (139.178.64.41) AS54825  19.720  19.683
 2 0.xe-0-0-17.dsr2.ewr1.packet.net (147.75.98.106) AS54825  17.796  17.776
 3 (198.16.4.86) AS54825  0.540
 3 0.ae11.bbr1.ewr1.packet.net (198.16.4.84) AS54825  0.444
iwik@[avenger]:~$ curl https://trace.ntppool.org/traceroute/62.197.224.14
Traceroute to 62.197.224.14
 1 (139.178.64.41) AS54825  11.906  11.869
 2 (147.75.98.104) AS54825  15.445
 2 0.xe-0-0-17.dsr2.ewr1.packet.net (147.75.98.106) AS54825  12.283
 3 (198.16.4.88) AS54825  0.516
 3 0.ae22.bbr2.ewr1.packet.net (198.16.4.90) AS54825  0.464
iwik@[avenger]:~$ curl https://trace.ntppool.org/traceroute/62.197.224.14
Traceroute to 62.197.224.14
 1 (139.178.64.41) AS54825  14.637  14.591
 2 (147.75.98.106) AS54825  29.749
 2 0.xe-0-0-17.dsr1.ewr1.packet.net (147.75.98.104) AS54825  13.531
iwik@[avenger]:~$ curl https://trace.ntppool.org/traceroute/62.197.224.14
Traceroute to 62.197.224.14
 1 (139.178.64.41) AS54825  21.024  20.988
 2 (147.75.98.104) AS54825  20.128
 2 0.xe-0-0-17.dsr2.ewr1.packet.net (147.75.98.106) AS54825  20.041
 3 0.ae12.bbr1.ewr1.packet.net (198.16.4.88) AS54825  0.421  0.413
iwik@[avenger]:~$ curl https://trace.ntppool.org/traceroute/62.197.224.14
Traceroute to 62.197.224.14
 1 (139.178.64.41) AS54825  11.749  11.705
 2 0.xe-0-0-17.dsr1.ewr1.packet.net (147.75.98.104) AS54825  12.831  12.809

From web browser output is better

Traceroute to 62.197.224.14
 1 (139.178.64.41) AS54825  42.988  42.949
 2 0.xe-0-0-17.dsr1.ewr1.packet.net (147.75.98.104) AS54825  21.007  20.992
 3 0.ae12.bbr1.ewr1.packet.net (198.16.4.88) AS54825  0.453  0.446
 4 (198.16.4.81) AS54825  0.269
 4 39.ae39.bbr1.jfk3.packet.net (192.80.8.99) AS54825  1.090
 5 (64.125.54.25) AS6461  1.187
 5 39.ae39.bbr1.jfk3.packet.net (192.80.8.99) AS6461  1.280
 6 64.125.54.25.available.above.net (64.125.54.25) AS6461  *  1.121
 7  *  *
 8 (64.125.29.127) AS6461  79.624
 8 ae0.cs1.lhr15.uk.eth.zayo.com (64.125.29.119) AS6461  79.210
 9 (64.125.29.16) AS6461  79.328
 9 ae0.cs1.lhr15.uk.eth.zayo.com (64.125.29.119) AS6461  79.201
10 (64.125.29.16) AS6461  79.188
10 ae0.cs1.ams17.nl.eth.zayo.com (64.125.29.81) AS6461  80.876
11 (64.125.29.58) AS6461  79.368
11 ae0.cs1.ams17.nl.eth.zayo.com (64.125.29.81) AS6461  80.699
12 ae0.cs1.fra9.de.eth.zayo.com (64.125.29.55) AS6461  79.216  79.319
13 (64.125.29.55) AS6461  79.301
13 ae27.mpr1.fra4.de.zip.zayo.com (64.125.30.255) AS6461  81.376
14 gwup.dc.ba.gts.sk (80.81.193.195)  96.288  97.063
15 (62.168.99.66) AS5578  98.704
15 gwup.dc.ba.gts.sk (80.81.193.195) AS5578  97.166
16 (195.168.61.115) AS5578  96.726
16 se-0-1-0-0.gwa.husarik.ca.gts.sk (62.168.99.66) AS5578  98.820
17 (62.197.224.14) AS16160  100.444
17 b3.ibm.ke.cust.gts.sk (195.168.61.115) AS16160  96.700
18  *  *
19  *  *

Next try:

Traceroute to 62.197.224.14
 1 (139.178.64.41) AS54825  10.950  10.902
 2 0.xe-0-0-17.dsr1.ewr1.packet.net (147.75.98.104) AS54825  12.076  12.060
 3 0.ae22.bbr2.ewr1.packet.net (198.16.4.90) AS54825  1.154  0.886
 4 (198.16.4.81) AS54825  0.486
 4 39.ae39.bbr1.jfk3.packet.net (192.80.8.99) AS54825  1.121
 5 (192.80.8.99) AS54825  1.107
 5 64.125.54.25.available.above.net (64.125.54.25) AS54825  1.149
 6 64.125.54.25.available.above.net (64.125.54.25) AS6461  1.129  1.341
 7  *  *
 8 ae0.cs1.lhr15.uk.eth.zayo.com (64.125.29.119) AS6461  79.803  *
 9 ae0.cs1.lhr15.uk.eth.zayo.com (64.125.29.119) AS6461  77.266  77.432
10 (64.125.29.16) AS6461  77.468
10 ae0.cs1.ams17.nl.eth.zayo.com (64.125.29.81) AS6461  77.393
11 ae0.cs1.ams17.nl.eth.zayo.com (64.125.29.81) AS6461  77.359  77.547
12 (64.125.29.55) AS6461  101.687
12 ae2.cs1.fra6.de.eth.zayo.com (64.125.29.58) AS6461  77.459
13 (64.125.30.255) AS6461  77.253
13 ae0.cs1.fra9.de.eth.zayo.com (64.125.29.55) AS6461  92.128
14 gwup.dc.ba.gts.sk (80.81.193.195)  90.545  90.271
15 (62.168.99.66) AS5578  90.374
15 gwup.dc.ba.gts.sk (80.81.193.195) AS5578  90.226
16 (62.168.99.66) AS5578  90.242
16 b3.ibm.ke.cust.gts.sk (195.168.61.115) AS5578  89.786
17 (195.168.61.115) AS5578  89.767
17 cloud.zazezi.net (62.197.224.14) AS5578  94.957
18 cloud.zazezi.net (62.197.224.14) AS16160  93.752  93.786

According to this, routing at first attempt is some strange at the end?

Hi @jff, your server is currently showing a steady score over 10 and seems to be missing only a few monitoring pings so it’s in the pool at the moment. The problem with internet routing is that it changes so unless you happen to catch it while it’s not responding it’s impossible to work out where the issue is… :frowning:

@iwik, I’m seeing similar drops at packet.net - maybe it’s worth an email to them (or your ISP) to see if they can investigate.

I’m seeing the same, much less reliable monitoring from NJ. (Still reachable fine in my own testing from various locations around the internet.)

The same problem here. I have an NTP server with IPv6 interface. Since the last two days ( https://www.ntppool.org/scores/2a00:6d40:60:80a0::1 ) my server started getting reachability errors from Newark, NJ monitoring station. I think there is a problem with that monitoring station. The NTP Status page ( https://status.ntppool.org/ ) shows a dramatic drop in IPv6 servers. The problem may be only on the IPv6 network side of the monitoring station.

Yes, I should have specified IPv6. https://www.ntppool.org/zone/north-america says 40% of NA IPv6 servers disappeared since yesterday…

Hi, one of the volunteer admins here. We’re seeing issues with IPv6 monitoring at the moment - I have flagged them to @ask to take a look when he’s next online.

1 Like

My servers haven’t had monitoring problems over the last few months – as far as I know – but I do now.

https://www.ntppool.org/scores/2600:1f16:ec6:ec6c:72a0:38cf:14ef:8bc0
https://web.beta.grundclock.com/scores/2600:1f16:ec6:ec6c:72a0:38cf:14ef:8bc0
https://web.beta.grundclock.com/scores/13.58.6.55 (for comparison)

I don’t think there are problems on my end, but I can’t say for certain.

I don’t think there was – it affected ~25% of all the IPv6 servers. :frowning:

One interesting thing to learn from this is that 2 monitoring systems isn’t enough to be helpful for this sort of thing, with the current scoring algorithm.

Maybe the system could have turned off the misbehaving monitor, but in the past we’ve seen major IPv6 problems before that really were real.

One example I remember was one of the big European IX’es null-routing all IPv6 traffic (or maybe it was just NTP traffic?) for a day. That could have appeared similarly to the system, and I don’t know if the right response would be to not believe the monitor in that case.

So … we need more than two watches to tell the time, which I guess isn’t a surprise to this community! :smile:

1 Like

Hi @ask,

Do you need a VM in Switzerland for monitoring ? Could it help the community ?

Thanks,
Cheers

Today we had again packet lost to range 62.197.192.0/18.
ntp-packet-lost

My zabbix monitoring shows some issues at zayo and packet.net:
2019-09-17 09:21:58
HOST: zabbix.iwik.org Loss% Snt Last Avg Best Wrst StDev

  1. iwik-home-vl100.lan.iwik.org 0.0% 3 0.2 0.3 0.2 0.3 0.0
  2. 62.197.224.1 0.0% 3 2.1 2.8 2.1 3.8 0.9
  3. b1.ibm.ke.cust.gts.sk 0.0% 3 5.2 4.5 2.7 5.4 1.5
  4. 62.168.99.154 0.0% 3 4.1 4.2 3.0 5.5 1.2
  5. nix2.zayo.com 0.0% 3 16.4 17.1 16.4 17.5 0.7
  6. ae8.mpr1.fra3.de.zip.zayo.co 0.0% 3 14.8 16.3 14.8 17.2 1.3
  7. ae27.cs1.fra6.de.eth.zayo.co 0.0% 3 93.8 93.8 93.6 94.0 0.2
  8. ae2.cs1.ams17.nl.eth.zayo.co 0.0% 3 93.8 93.7 93.3 93.9 0.3
  9. ae0.cs1.ams10.nl.eth.zayo.co 0.0% 3 93.9 93.3 92.1 93.9 1.0
  10. ae2.cs1.lhr15.uk.eth.zayo.co 0.0% 3 92.8 92.3 90.4 93.8 1.8
  11. ae0.cs1.lhr11.uk.eth.zayo.co 0.0% 3 93.8 92.2 91.4 93.8 1.4
  12. ae5.cs1.lga5.us.eth.zayo.com 0.0% 3 105.5 112.0 93.5 137.0 22.5
  13. ae15.er1.lga5.us.zip.zayo.co 0.0% 3 93.1 92.8 91.9 93.5 0.8
  14. 64.125.54.26 0.0% 3 93.2 92.2 91.4 93.2 0.9
  15. 39.ae32.bbr2.ewr1.packet.net 0.0% 3 93.9 93.1 91.0 94.3 1.8
  16. 0.ae22.dsr2.ewr1.packet.net 0.0% 3 101.3 109.5 101.3 114.6 7.2
  17. 139.178.64.41 33.3% 3 105.2 109.5 105.2 113.7 6.0
    2019-09-17 09:20:58
    HOST: zabbix.iwik.org Loss% Snt Last Avg Best Wrst StDev
  18. iwik-home-vl100.lan.iwik.org 0.0% 3 0.2 0.3 0.2 0.3 0.0
  19. 62.197.224.1 0.0% 3 4.0 1.9 0.8 4.0 1.8
  20. b1.ibm.ke.cust.gts.sk 0.0% 3 3.3 3.6 2.6 4.9 1.2
  21. 62.168.99.154 0.0% 3 4.0 4.0 2.4 5.5 1.6
  22. nix2.zayo.com 0.0% 3 17.4 17.7 15.7 19.9 2.1
  23. ae8.mpr1.fra3.de.zip.zayo.co 0.0% 3 18.2 16.5 14.4 18.2 1.9
  24. ae27.cs1.fra6.de.eth.zayo.co 0.0% 3 93.9 93.7 93.6 93.9 0.2
  25. ae2.cs1.ams17.nl.eth.zayo.co 0.0% 3 94.1 93.5 90.5 96.0 2.8
  26. ae0.cs1.ams10.nl.eth.zayo.co 0.0% 3 92.5 91.6 90.3 92.5 1.2
  27. ae2.cs1.lhr15.uk.eth.zayo.co 0.0% 3 94.2 93.6 92.9 94.2 0.6
  28. ae0.cs1.lhr11.uk.eth.zayo.co 0.0% 3 90.3 91.6 90.3 93.8 1.9
  29. ae5.cs1.lga5.us.eth.zayo.com 33.3% 3 93.0 93.8 93.0 94.6 1.1
  30. ae15.er1.lga5.us.zip.zayo.co 0.0% 3 91.3 92.6 91.3 94.0 1.4
  31. 64.125.54.26 0.0% 3 90.1 104.0 90.1 128.7 21.5
  32. 39.ae32.bbr2.ewr1.packet.net 0.0% 3 92.4 92.2 91.9 92.4 0.3
  33. 0.ae22.dsr2.ewr1.packet.net 0.0% 3 110.7 109.8 109.2 110.7 0.8
  34. 139.178.64.41 33.3% 3 114.4 111.7 109.0 114.4 3.8
    2019-09-17 09:19:58
    HOST: zabbix.iwik.org Loss% Snt Last Avg Best Wrst StDev
  35. iwik-home-vl100.lan.iwik.org 0.0% 3 0.2 0.2 0.2 0.3 0.0
  36. 62.197.224.1 0.0% 3 3.6 3.4 3.3 3.6 0.2
  37. b1.ibm.ke.cust.gts.sk 0.0% 3 2.1 3.2 1.8 5.6 2.1
  38. 62.168.99.154 0.0% 3 5.1 4.7 4.1 5.1 0.5
  39. nix2.zayo.com 0.0% 3 16.3 15.7 14.3 16.5 1.2
  40. ae8.mpr1.fra3.de.zip.zayo.co 0.0% 3 17.8 16.5 15.1 17.8 1.4
  41. ae27.cs1.fra6.de.eth.zayo.co 0.0% 3 93.9 93.0 91.8 93.9 1.1
  42. ae2.cs1.ams17.nl.eth.zayo.co 0.0% 3 98.1 93.3 90.7 98.1 4.2
  43. ae0.cs1.ams10.nl.eth.zayo.co 0.0% 3 92.9 95.9 90.8 103.8 7.0
  44. ae2.cs1.lhr15.uk.eth.zayo.co 0.0% 3 93.0 93.0 92.2 93.9 0.8
  45. ae0.cs1.lhr11.uk.eth.zayo.co 0.0% 3 94.0 92.5 91.5 94.0 1.3
  46. ae5.cs1.lga5.us.eth.zayo.com 0.0% 3 96.6 95.1 92.7 96.6 2.1
  47. ae15.er1.lga5.us.zip.zayo.co 0.0% 3 93.1 93.3 93.1 93.8 0.4
  48. 64.125.54.26 0.0% 3 90.1 92.2 90.1 93.4 1.8
  49. 39.ae32.bbr2.ewr1.packet.net 0.0% 3 96.0 93.8 91.1 96.0 2.5
  50. 0.ae22.dsr2.ewr1.packet.net 33.3% 3 107.3 106.6 105.9 107.3 1.0
  51. 139.178.64.41 33.3% 3 108.9 105.7 102.4 108.9 4.6
    2019-09-17 09:18:57
    HOST: zabbix.iwik.org Loss% Snt Last Avg Best Wrst StDev
  52. iwik-home-vl100.lan.iwik.org 0.0% 3 0.3 0.3 0.2 0.4 0.1
  53. 62.197.224.1 0.0% 3 4.2 2.9 1.2 4.2 1.6
  54. b1.ibm.ke.cust.gts.sk 0.0% 3 5.3 4.9 3.4 6.0 1.3
  55. 62.168.99.154 0.0% 3 5.4 5.1 4.8 5.4 0.3
  56. nix2.zayo.com 0.0% 3 16.6 16.7 16.3 17.2 0.5
  57. ae8.mpr1.fra3.de.zip.zayo.co 0.0% 3 15.1 16.6 15.1 17.7 1.4
  58. ae27.cs1.fra6.de.eth.zayo.co 0.0% 3 94.1 93.3 92.3 94.1 0.9
  59. ae2.cs1.ams17.nl.eth.zayo.co 0.0% 3 93.2 98.7 93.2 109.2 9.2
  60. ae0.cs1.ams10.nl.eth.zayo.co 0.0% 3 94.2 93.8 93.0 94.2 0.7
  61. ae2.cs1.lhr15.uk.eth.zayo.co 0.0% 3 92.3 93.9 92.3 95.3 1.5
  62. ae0.cs1.lhr11.uk.eth.zayo.co 0.0% 3 93.7 93.7 93.6 93.8 0.1
  63. ae5.cs1.lga5.us.eth.zayo.com 66.7% 3 93.7 93.7 93.7 93.7 0.0
  64. ae15.er1.lga5.us.zip.zayo.co 0.0% 3 93.1 94.5 93.1 96.5 1.8
  65. 64.125.54.26 0.0% 3 92.1 92.7 92.1 93.8 0.9
  66. 39.ae32.bbr2.ewr1.packet.net 0.0% 3 94.0 94.0 93.6 94.5 0.5
  67. 0.ae22.dsr2.ewr1.packet.net 0.0% 3 106.9 107.9 106.9 108.5 0.8
  68. 139.178.64.41 33.3% 3 111.2 105.4 99.5 111.2 8.3

I have opened ticket with packet to check.

1 Like

Same here, dropped from pool but IPv6 connectivity is fine from various test points. Currently there are 15% fewer US IPv6 servers in the pool than yesterday…

Hi, currently showing “214 (+3) active 1 day ago”, but definitely worth logging a ticket with your ISP if you’re seeing problems.

mtr --udp -P 123 2604:1380:2:6000::15 from your NTP server will test the route to port 123 of the monitoring server over UDP. (Bearing in mind some packet loss is expected with UDP as opposed to TCP.)

(For IPv4 the command is mtr --udp -P 123 139.178.64.42 )

There really isn’t a point in logging a ticket with my ISP. Yes, a few servers came back, but 30-some are gone which have been around for a long time. From various other machines on the internet the connectivity is fine (they’re syncing NTP no problem) but the monitoring system clearly has issues. Again. When that many servers all have problems at the same time, and random test sites see them fine, the issue is the monitor, not all of the other servers.

FWIW, I have zero loss right up until it falls apart at 2604:1380:2:63::1. Not much my ISP can do about that. There’s also horrible latency & jitter once the trace reaches 2604:1380::

I disagree with your conclusion that “the issue is the monitor, not all of the other servers”. Unless “all the other servers” are connected directly to the monitor then there’s some amount of internet between each server and the monitor.

People add servers to the pool and take servers out of the pool all of the time. At the North America level there are 11% less IPv6 servers than 14 days ago, but at the Global level it’s 6%. Both levels have more active IPv6 servers than yesterday. That doesn’t sound like an issue at the monitor end. Do you have specific knowledge of the 30-some and why they’re no longer in the pool? As volunteer admins we (currently) have no access to any more information than you do.

NTP uses UDP packets, which aren’t guaranteed to be delivered. Routers may be overloaded and drop packets. They may have traffic shaping or limiting rules in place. A router may be misconfigured.

Tracerouting to your server from the monitor three times gave me three different routes of either 7 or 8 hops - none of the traceroutes reach as far as your server. In two cases the last hop has “tunnel” in the name which implies the route might then be tunnelled.

For me the available evidence points to the issue being a routing issue between you and the monitor. As you have the relationship with your ISP to deliver traffic then my suggestion is for you to log a ticket with your ISP who will have access to diagnostics that neither you or I do.