Monitoring stations timeout to our NTP servers

Just two side questions:

  1. are these ip static ? reverse lookup shows up as normal ISP
  2. is the line symetric or asymetric ?

It‘s ISP but IPs should never change; and it‘s an asymmetric line.

Ok, thought it’s like a dial up which change the ip.

@mby What sort of traffic levels (NTP packets per second) are you seeing during the times when the score is below 10?

Thanks Steve, and it is always super-low traffic; appalling is that only the IPv4 probes break down while IPv6 is alive and kicking…
But anyways, I’ve monitored the accuracy of my server quite extensively and di some testing, so it does not appear to be a local problem, so I consider this as closed, thank you everyone.

P.S.: When looking at the beta site data, I have these statistics, so it looks like something I can’t influence, indeed:
IPv4: Monitoring Station:Newark, NJ, US (7.9)Los Angeles, CA (-49.3)Amsterdam (-0.3)
IPv6: Monitoring Station:Newark, NJ, US (19.5)Amsterdam (19.4)

@mby those are delightfully consistent problems!

@stevesommars might be able to help analyze some tcpdump’s if you can work with him. He’s been working out which [ network links / transit providers / etc ] are causing trouble and looking at the failure patterns.

1 Like

Thanks @ask, I’m currently working with @stevesommars on it; we’ll keep you posted.

@ask would it be possible to get hostnames or IP addresses for the Amsterdam/L.A. monitoring stations? I’d like to compare routing issues between all 3 monitoring stations. I’m currently suffering from packets being dropped from the NJ monitor in the production pool (I believe this is zayo, but our networks team will be able to dive deeper into this).

Any help would be great as this is negatively affecting our polling and our NTP service is one of very few in IE.

I see you deleted your servers.

UDP’s can be dropped anywhere, and it happens often.
Cisco is telling people that 5% drops on UDP is acceptable.

Yes the score of the pool is based on 1 monitor and 1 monitor alone, and it needs a few misses to kick you out of the pool.

As often said here, there must be more monitors and it must at least average on those, but in fact 1 monitor reporting you as GOOD should be enough.

The beta-system does and my servers are marked by LA as good and dismissed by Newark (as usual).

Yet the system still isn’t fixed. It’s 6 months now…the beta system is 10000000x better then this.

No @Ask, there is no NTP filtering, it’s simply not true. ISP’s use NTP themselves to keep time!

@HEAnet LA is 45.54.12.11, the Amsterdam one is monams1.ntppool.net.

1 Like

If someone says that I’d like to see them configure their computer to drop 5% of UDP traffic and then open a web page that requires 300 DNS queries or watch YouTube over QUIC.

That statement makes several assumptions. Network-wide blocking isn’t common, but rate limiting is something else.

1 Like

Not sure whats going on

Looking at this section of my report

1583264160,“2020-03-03 19:36:00”,0,-5,1.1,6,“Newark, NJ, US”,“i/o timeout”
1583264160,“2020-03-03 19:36:00”,0,-5,1.1,“i/o timeout”
1583263174,“2020-03-03 19:19:34”,0.000532588,1,6.4,6,“Newark, NJ, US”,0,
1583263174,“2020-03-03 19:19:34”,0.000532588,1,6.4,0,

Can see 4 contacts 2 at 19:19 and 2 at 19:36 (which apparently didnt respond)

I recieved and replied to 3 requests at 19:19 (not 2 not sure whats going on with that?)

19:19:24.843963 IP monewr1.ntppool.net.46571 > raspberrypi.local.ntp: NTPv4, Client, length 48
19:19:24.844086 IP raspberrypi.local.ntp > monewr1.ntppool.net.46571: NTPv4, Server, length 48
19:19:29.842189 IP monewr1.ntppool.net.43195 > raspberrypi.local.ntp: NTPv4, Client, length 48
19:19:29.842356 IP raspberrypi.local.ntp > monewr1.ntppool.net.43195: NTPv4, Server, length 48
19:19:31.922697 IP monewr1.ntppool.net.41698 > raspberrypi.local.ntp: NTPv4, Client, length 48
19:19:31.922779 IP raspberrypi.local.ntp > monewr1.ntppool.net.41698: NTPv4, Server, length 48

Looking at TCP Dump it seems it actually sent 3 requests and i sent 3 responses at 19:35 but they did not make it back to the monitor and why does it only show 2 in the logs on the monitor?

19:35:47.464786 IP monewr1.ntppool.net.48228 > raspberrypi.local.ntp: NTPv4, Client, length 48
19:35:47.464863 IP raspberrypi.local.ntp > monewr1.ntppool.net.48228: NTPv4, Server, length 48
19:35:52.465304 IP monewr1.ntppool.net.50627 > raspberrypi.local.ntp: NTPv4, Client, length 48
19:35:52.465422 IP raspberrypi.local.ntp > monewr1.ntppool.net.50627: NTPv4, Server, length 48
19:35:57.465701 IP monewr1.ntppool.net.40515 > raspberrypi.local.ntp: NTPv4, Client, length 48
19:35:57.465848 IP raspberrypi.local.ntp > monewr1.ntppool.net.40515: NTPv4, Server, length 48

With this running i had a ping/traceroute running ever 5 seconds which reported no changes and seems no dropped packets other than at some hops which may be IMCP?.

On the beta site it seems the most reliable for me is

  1. LA
  2. Amsterdam
  3. Newark

Which is strange of itself for LA and Amsterdam considering i’m in the UK i thought Amsterdam would be the most reliable!
https://web.beta.grundclock.com/scores/81.174.133.68

Traceroute to all 3 in the order of Newark, Amsterdam, LA

Tracing route to monewr1.ntppool.net [139.178.64.42]
over a maximum of 30 hops:

1 <1 ms <1 ms <1 ms router.local [192.168.93.1]
2 10 ms 10 ms 10 ms 195.166.130.248
3 23 ms 11 ms 11 ms 84.93.253.71
4 11 ms 11 ms 10 ms 195.99.125.140
5 10 ms 11 ms 11 ms core2-hu0-1-0-1-1.colindale.ukcore.bt.net [195.99.127.9]
6 11 ms 11 ms 11 ms 109.159.252.134
7 11 ms 10 ms 11 ms 166-49-209-194.eu.bt.net [166.49.209.194]
8 * * * Request timed out.
9 29 ms 12 ms 12 ms ae11.mpr2.lhr2.uk.zip.zayo.com [64.125.30.52]
10 * * * Request timed out.
11 83 ms 77 ms 78 ms ae5.cs3.lga5.us.eth.zayo.com [64.125.29.126]
12 78 ms 77 ms 77 ms ae15.er1.lga5.us.zip.zayo.com [64.125.29.221]
13 80 ms 79 ms 79 ms 64.125.54.26.available.above.net [64.125.54.26]
14 80 ms 80 ms 80 ms 0.et-0-0-1.bsr2.ewr1.packet.net [198.16.6.237]
15 95 ms 101 ms 98 ms 0.ae2.dsr2.ewr1.packet.net [198.16.4.215]
16 95 ms 98 ms 98 ms 147.75.98.107
17 82 ms 82 ms 82 ms monewr1.ntppool.net [139.178.64.42]

Trace complete.

C:\Users\Daniel>tracert monams1.ntppool.net

Tracing route to monams1.ntppool.net [147.75.84.170]
over a maximum of 30 hops:

1 <1 ms <1 ms <1 ms router.local [192.168.93.1]
2 10 ms 10 ms 10 ms 195.166.130.248
3 11 ms 11 ms 11 ms 84.93.253.71
4 11 ms 11 ms 10 ms 195.99.125.140
5 11 ms 11 ms 11 ms peer7-et-4-1-1.telehouse.ukcore.bt.net [194.72.16.134]
6 12 ms 14 ms 11 ms 166-49-128-32.eu.bt.net [166.49.128.32]
7 12 ms 12 ms 11 ms ldn-b1-link.telia.net [213.248.97.48]
8 18 ms 19 ms 18 ms ldn-bb3-link.telia.net [62.115.114.234]
9 18 ms 18 ms 18 ms adm-bb3-link.telia.net [213.155.136.99]
10 18 ms 20 ms 18 ms adm-b1-link.telia.net [62.115.136.195]
11 18 ms 18 ms 28 ms packethost-ic-346116-adm-b4.c.telia.net [62.115.176.233]
12 39 ms 32 ms 118 ms 198.16.6.37
13 17 ms 19 ms 17 ms 147.75.84.170

Trace complete.

C:\Users\Daniel>tracert 45.54.12.11

Tracing route to 11.12.54.45.ptr.anycast.net [45.54.12.11]
over a maximum of 30 hops:

1 <1 ms <1 ms <1 ms router.local [192.168.93.1]
2 10 ms 22 ms 10 ms 195.166.130.248
3 10 ms 10 ms 11 ms 84.93.253.67
4 11 ms 11 ms 10 ms ^C
C:\Users\Daniel>tracert 45.54.12.11

Tracing route to 11.12.54.45.ptr.anycast.net [45.54.12.11]
over a maximum of 30 hops:

1 <1 ms <1 ms <1 ms router.local [192.168.93.1]
2 10 ms 10 ms 10 ms 195.166.130.248
3 15 ms 11 ms 11 ms 84.93.253.67
4 11 ms 11 ms 29 ms 195.99.125.136
5 39 ms 18 ms 11 ms peer7-et-4-1-4.telehouse.ukcore.bt.net [194.72.16.140]
6 25 ms 11 ms 11 ms 166-49-128-32.eu.bt.net [166.49.128.32]
7 11 ms 29 ms 12 ms hu0-6-0-4.ccr22.lon01.atlas.cogentco.com [130.117.14.65]
8 39 ms 21 ms 12 ms be2870.ccr41.lon13.atlas.cogentco.com [154.54.58.173]
9 102 ms 135 ms 90 ms be12497.ccr41.par01.atlas.cogentco.com [154.54.56.130]
10 112 ms 115 ms 100 ms be3627.ccr41.jfk02.atlas.cogentco.com [66.28.4.197]
11 99 ms 98 ms 98 ms be2806.ccr41.dca01.atlas.cogentco.com [154.54.40.106]
12 109 ms 109 ms 109 ms be2112.ccr41.atl01.atlas.cogentco.com [154.54.7.158]
13 119 ms 117 ms 118 ms be2687.ccr41.iah01.atlas.cogentco.com [154.54.28.70]
14 172 ms 132 ms 181 ms be2927.ccr21.elp01.atlas.cogentco.com [154.54.29.222]
15 146 ms 151 ms 153 ms be2930.ccr32.phx01.atlas.cogentco.com [154.54.42.77]
16 180 ms 152 ms 152 ms be2932.ccr42.lax01.atlas.cogentco.com [154.54.45.162]
17 159 ms 158 ms 159 ms be2199.rcr21.b020604-0.lax01.atlas.cogentco.com [154.54.2.174]
18 152 ms 154 ms 152 ms 38.140.153.10
19 * * * Request timed out.
20 152 ms 152 ms 152 ms 192.73.255.218
21 152 ms 151 ms 168 ms 11.12.54.45.ptr.anycast.net [45.54.12.11]

Trace complete.

1 Like

Some (many?) ISPs/IXPs are filtering based UDP port 123 based on size, rate and possibly other characteristics. By default traceroute does not use UDP port 123, so it often fails to detect filtering. On Linux I typically do:
traceroute -n -U -p 123 IP_address

Either the NTP request or NTP response may be lost.
A port 123 traceroute running from the NTP client may detect NTP request (mode 3) loss. A port 123 traceroute running from the NTP server may detect NTP response (mode 4) loss.

Most of the NTP packet losses (timeouts) that I’ve worked on recently have been in the NTP server -> Newark direction.

Its nice to detect which ISP/IXP drops the NTP packets, but that hasn’t helped so far. I can’t get any response from them.

4 Likes

ISP’s do not filter on UDP-NTP packets.
Dan is having the same problems as a lot of us.

Please stop this nonsense that ISP’s drop packets on purpose, they do not.

ISP’s do use NTP themselves, would be stupid to filter it.

While that may be true the internet is never constant, it is possible that some network deployed some form of DDoS protections (intentionally rate limiting UDP) or is under an active DDoS attack https://www.digitalattackmap.com/ https://horizon.netscout.com/?filters=trigger.triggerName.UDP It is also possible that networks operators make a mistake from time to time or just that the buffer of some random equipment is full and had to drop a packet. routing issues… network upgrades…

1 Like

The problem is this system is badly managed, the monitor is dropping everybody while being the problem itself.
The beta-system has proven this over and over again.
Management of this pool does nothing.

They always tell you: It’s you OR your provider.

ntppool.org is the problem and after 6 months it’s still not fixed.

The question is: How much time does @Ask need to fix it?
Can he fix it?

The beta-system is better, but it never seems to replace the “normal”-system.

Ps. my servers are deleted for a second time, I’m fed-up with this crap. They can be used at my own pool: ntp.heppen.be also round-robin and spot on time.

If he was collective CTO of all of the backbone ISPs that rate limit NTP traffic, I imagine he would have adjusted their policies by now.

2 Likes

None of my servers have issues… Saying “everybody” is a very broad and inaccurate statement…

4 Likes

Hi,

I can see your hosting that on ZEN, I’m with Plusnet but have the issue alot worse than you (mine never gets into the pool).

If you compare to my traceroutes Monitoring stations timeout to our NTP servers which do we seem to have in common?