Problems with the Los Angeles IPv4 monitoring Station


#1

I have run half off all the NTP servers in Costa Rica for many years now. These servers serve hundreds of devices on the internal ISP network they are part of and thousands more outside. Almost 2 months ago the responses they give don’t get logged or don’t reach the Los Angeles monitoring Station. This only happens using IPv4, on IPv6 it works fine.

Today I decided to take all the servers off the NTP pool since apparently there is no way to fix this. Ask suggested to “see if someone else can monitor the server for a few days”

So here are 2 of them: 200.59.16.50 and 200.59.19.5

So, if anyone can monitor these servers for the next couple of days it would be very much appreciated.


Time server pool problems since mid February
#2

I have some pool servers in Europe and one of them is also monitoring a subset of other pool servers. I have some data for 200.59.16.50. It seems over the last six months the reachability was 97.4%. I think that’s pretty good. It’s actually slightly better than reachability of the IPv6 address of the server in the same period, which was 96.9%.


#3

That server in particular is a Stratum 1 and NTP is his only job. He had a hardware problem beginning February and was taken out for a complete hardware upgrade during March, during that time a Stratum 2 server answered the queries. So yes 97 % is for sure right considering all the hardware issues he suffered. During April he performed just fine but you can see the huge drop in traffic since most of the time is out of the pool.


#4

I did some traffic engineering and now send the reply using a different upstream provider.

The problem seems to be completely gone. But I wonder if it’s just packet loss via that link or something else. I still think that having at least 3 monitoring stations is critical for having a better picture of what is going on. The fact that the AS path was shorter and the routers were sending the replies using the other link is the default and changing inbound routes did not have any effect. So even though the monitoring station receives the replies fine now, others may not, and we won’t notice unless we monitor from a lot of different locations.


#5

This is great. Can you tell what the upstream provider change was? I wonder if they are blocking (some?) NTP packets…


#6

The old upstream route towards the monitor went:

root@stratum1:~# traceroute 207.171.3.17 
traceroute to 207.171.3.17 (207.171.3.17), 30 hops max, 60 byte packets
 1  ether3.mikrotik-bb-zeus.fratec.net (200.59.16.33)  0.189 ms  0.160 ms  0.133 ms
 2  sfp-sfpplus1.mikrotik-bb-router-gnd1.fratec.net (200.59.17.43)  0.573 ms  0.563 ms  0.545 ms
 [AS52263 Telecable Economico S.A.]
 3  rev185.125.nstelecablecr.com (190.113.125.185)  1.201 ms  1.327 ms  1.165 ms
 4  172.31.1.9 (172.31.1.9)  1.158 ms  1.139 ms  1.195 ms
 5  190.242.134.93 (190.242.134.93)  1.133 ms  1.174 ms  1.097 ms
 [AS23520 Columbus Networks USA, Inc.]
 6  xe-4-0-0.0-corozal-pamiami.fl.us.nama.pan-corzl-mx01.cwc.com (63.245.106.167)  12.524 ms  12.421 ms  12.415 ms
 7  xe-3-0-0.0-boca-raton.fl.us.brx-teracore01.cwc.com (63.245.106.164)  62.754 ms xe-6-1-4.0-boca-raton.fl.us.nmi-teracore02.cwc.com (63.245.107.126)  44.313 ms  44.302 ms
 8  xe-0-0-45-1.a00.miamfl02.us.bb.gin.ntt.net (129.250.198.217)  59.425 ms xe-0-0-47-2.a00.miamfl02.us.bb.gin.ntt.net (128.242.180.197)  54.737 ms  54.741 ms
 [AS2914 NTT America, Inc.]
 9  ae-7.r04.miamfl02.us.bb.gin.ntt.net (129.250.2.202)  131.008 ms  140.788 ms  130.972 ms
10  ae-3.r21.miamfl02.us.bb.gin.ntt.net (129.250.4.250)  56.116 ms  54.307 ms  47.513 ms
11  ae-4.r22.dllstx09.us.bb.gin.ntt.net (129.250.2.219)  93.713 ms  87.852 ms  84.646 ms
12  ae-5.r22.lsanca07.us.bb.gin.ntt.net (129.250.7.69)  131.629 ms  118.773 ms  126.571 ms
13  ae-1.r01.lsanca07.us.bb.gin.ntt.net (129.250.3.123)  144.666 ms  128.926 ms  136.790 ms
14  te7-1.r01.lax2.phyber.com (198.172.90.74)  137.752 ms  128.881 ms  141.051 ms
15  te7-4.r02.lax2.phyber.com (207.171.30.62)  300.943 ms  300.936 ms  300.906 ms
16  ntplax7.ntppool.net (207.171.3.17)  126.467 ms !X  131.171 ms !X  118.436 ms !X 

The new and working route is:

root@stratum1:~# traceroute 207.171.3.17
traceroute to 207.171.3.17 (207.171.3.17), 30 hops max, 60 byte packets
 1  ether3.mikrotik-bb-zeus.fratec.net (200.59.16.33)  0.158 ms  0.201 ms  0.115 ms
 [AS20299 Newcom Limited]
 2  186.176.7.73 (186.176.7.73)  0.819 ms  0.816 ms  0.802 ms
 3  186.32.0.217 (186.32.0.217)  0.881 ms  0.867 ms  0.875 ms
 [AS1299 Telia Company AB]
 4  190.106.192.237 (190.106.192.237)  43.285 ms  43.254 ms  43.240 ms
 5  mai-b1-link.telia.net (62.115.52.241)  41.705 ms  44.023 ms  44.028 ms
 6  mai-b1-link.telia.net (62.115.138.160)  41.310 ms mai-b1-link.telia.net (80.91.250.236)  38.891 ms mai-b1-link.telia.net (80.91.253.220)  38.874 ms
 7  ntt-ic-321350-mai-b1.c.telia.net (213.248.81.63)  41.066 ms  42.894 ms  42.899 ms
 [AS2914 NTT America, Inc.]
 8  ae-3.r21.miamfl02.us.bb.gin.ntt.net (129.250.4.250)  43.410 ms  43.405 ms  40.982 ms
 9  ae-4.r22.dllstx09.us.bb.gin.ntt.net (129.250.2.219)  70.660 ms  75.940 ms  75.899 ms
10  ae-5.r22.lsanca07.us.bb.gin.ntt.net (129.250.7.69)  110.576 ms  100.157 ms  102.728 ms
11  ae-1.r01.lsanca07.us.bb.gin.ntt.net (129.250.3.123)  99.446 ms  103.431 ms  100.113 ms
12  te0-0-0-0.r04.lax02.as7012.net (198.172.90.74)  107.573 ms  99.088 ms  99.029 ms
13  te7-4.r02.lax2.phyber.com (207.171.30.62)  99.626 ms  99.427 ms  102.202 ms
14  ntplax7.ntppool.net (207.171.3.17)  100.043 ms !X  98.850 ms !X  100.048 ms !X

So the first two transit providers changed, so yes, possibly AS52263 Telecable Economico S.A. or AS23520 Columbus Networks USA, Inc. are dropping NTP packets.