Problems with the Los Angeles IPv4 monitoring Station

lordgurke · May 9, 2019, 11:51pm

A major update on this!

My server still seems unreachable to the monitoring station since the maintenance weekend.
The very same server is reachable on IPv6 without any problems.

With tcpdump I can see (presumably) the monitor is querying my server and replies are sent - but appearently these are getting lost somewhere on the way from my server to the monitor.

These are the packets captured with tcpdump on 2019-05-09 (time is UTC):

22:32:28.678055 IP 139.178.64.42.54313 > 217.144.138.234.123: NTPv4, Client, length 48
22:32:28.678510 IP 217.144.138.234.123 > 139.178.64.42.54313: NTPv4, Server, length 48
22:47:41.217332 IP 139.178.64.42.41577 > 217.144.138.234.123: NTPv4, Client, length 48
22:47:41.217772 IP 217.144.138.234.123 > 139.178.64.42.41577: NTPv4, Server, length 48
23:02:53.329398 IP 139.178.64.42.49108 > 217.144.138.234.123: NTPv4, Client, length 48
23:02:53.329835 IP 217.144.138.234.123 > 139.178.64.42.49108: NTPv4, Server, length 48

The monitor’s CSV protocol declares “i/o timeout” for most of these queries:

ts_epoch,ts,offset,step,score,monitor_id,monitor_name,leap,error
1557442976,"2019-05-09 23:02:56",0,-5,-63.7,6,"Newark, NJ, US",,"i/o timeout"
1557442976,"2019-05-09 23:02:56",0,-5,-63.7,,,,"i/o timeout"
1557442061,"2019-05-09 22:47:41",-0.002267079,1,-61.8,6,"Newark, NJ, US",0,
1557442061,"2019-05-09 22:47:41",-0.002267079,1,-61.8,,,0,
1557441151,"2019-05-09 22:32:31",0,-5,-66.1,6,"Newark, NJ, US",,"i/o timeout"
1557441151,"2019-05-09 22:32:31",0,-5,-66.1,,,,"i/o timeout"

When doing some traceroutes I detected some weirdness:
Packets originate at source port 123/UDP, targeted to a random (on tcpdump seen) upper port on the monitoring station.
It looks like these packets are getting rate-limited, probably by Centurylink (f.k.a. Level3)!

~# traceroute -z 0.6 -w 0.5 -U --sport=123 -p 54313 -q 10 -t 0xb8 -A 139.178.64.42
traceroute to 139.178.64.42 (139.178.64.42), 30 hops max, 60 byte packets
 1  ipv4gate.ntwk-w.301-moved.de (217.144.138.225) [AS15987/AS8820]  0.473 ms  0.382 ms  0.277 ms  0.406 ms  0.322 ms  0.330 ms  0.324 ms  0.327 ms  0.336 ms  0.273 ms
 2  r4-pty.wup.tal.de (81.92.2.89) [AS8820]  0.463 ms  0.244 ms  0.397 ms  0.393 ms  0.321 ms  0.330 ms  1.973 ms  0.309 ms  0.296 ms  0.460 ms
 3  xe-9-1-2.edge4.dus1.level3.net (194.54.94.65) [AS41692]  1.532 ms  1.171 ms  0.867 ms  0.838 ms  0.820 ms  0.893 ms  1.015 ms  0.930 ms  0.896 ms  1.252 ms
 4  * * * * * * * * * *
 5  * nyc2-brdr-02.inet.qwest.net (63.235.42.101) [AS209]  78.080 ms  78.133 ms * *  78.113 ms * * * *
 6  dca-edge-22.inet.qwest.net (67.14.6.142) [AS209]  114.805 ms * *  87.070 ms * * *  87.064 ms  86.962 ms *
 7  * 72.165.161.86 (72.165.161.86) [AS209]  86.660 ms * * * *  86.746 ms *  86.777 ms  86.763 ms
 8  * * * * lag32.fr3.lga.llnw.net (68.142.88.157) [AS22822]  83.146 ms * *  83.050 ms *  83.028 ms
 9  * * * * * * * * * *
10  0.xe-1-0-0.bbr1.ewr1.packet.net (198.16.4.94) [AS5485/AS54825]  83.947 ms * *  89.428 ms *  84.074 ms  83.961 ms *  84.064 ms *
11  * * * * * * * * * *
12  * * * * * * * * * *
13  monewr1.ntppool.net (139.178.64.42) [AS54825]  83.946 ms * *  84.061 ms  84.089 ms *  84.031 ms * * *

When doing the same traceroute, but just changing the source port to a random upper port, the result looks fine:

~# traceroute -z 0.6 -w 0.5 -U --sport=51553 -p 54313 -q 10 -t 0xb8 -A 139.178.64.42
traceroute to 139.178.64.42 (139.178.64.42), 30 hops max, 60 byte packets
 1  ipv4gate.ntwk-w.301-moved.de (217.144.138.225) [AS15987/AS8820]  0.607 ms  0.281 ms  0.294 ms  0.286 ms  0.341 ms  0.305 ms  0.243 ms  0.255 ms  0.289 ms  0.262 ms
 2  r4-pty.wup.tal.de (81.92.2.89) [AS8820]  0.405 ms  2.329 ms  28.117 ms  4.922 ms  0.493 ms  0.314 ms  0.295 ms  0.198 ms  1.087 ms  0.272 ms
 3  xe-9-1-2.edge4.dus1.level3.net (194.54.94.65) [AS41692]  0.850 ms  0.974 ms  0.876 ms  0.877 ms  0.952 ms  0.946 ms  0.859 ms  1.212 ms  0.971 ms  1.540 ms
 4  * * * * * * * * * *
 5  nyc2-brdr-02.inet.qwest.net (63.235.42.101) [AS209]  78.133 ms  78.162 ms  81.005 ms  89.660 ms  77.895 ms  78.105 ms  77.973 ms  78.022 ms  78.158 ms  78.083 ms
 6  dca-edge-22.inet.qwest.net (67.14.6.142) [AS209]  86.963 ms  104.652 ms  86.935 ms  87.029 ms  87.147 ms  86.899 ms  90.714 ms  86.926 ms  87.130 ms  87.038 ms
 7  72.165.161.86 (72.165.161.86) [AS209]  86.726 ms  86.630 ms  86.687 ms  86.644 ms  86.707 ms  86.710 ms  86.751 ms  86.667 ms  86.647 ms  87.720 ms
 8  lag32.fr3.lga.llnw.net (68.142.88.157) [AS22822]  82.979 ms  82.989 ms  83.143 ms  83.084 ms  83.030 ms  82.964 ms  83.022 ms  83.035 ms  83.041 ms  82.956 ms
 9  * * * * * * * * * *
10  0.xe-1-0-0.bbr1.ewr1.packet.net (198.16.4.94) [AS5485/AS54825]  84.579 ms  84.652 ms  83.948 ms  83.885 ms  84.337 ms  92.428 ms  83.922 ms  84.265 ms  84.051 ms  83.884 ms
11  * * * * * * * * * *
12  * * * * * * * * * *
13  monewr1.ntppool.net (139.178.64.42) [AS54825]  84.023 ms  84.070 ms  83.924 ms  83.882 ms  83.872 ms  84.068 ms  84.052 ms  83.909 ms  83.814 ms  83.931 ms

(For being more accurate, I set ToS = 0xb8, since this is the tag my ntpd applys to all outgoing packets) - but to be honest, it didn’t change anything)

Evidently, the culprit is Centurylink! Since the brand “Qwest” seems to be used by Centurylink even nearly 10 years after acquiring them, there is just no other provider between hop 3 and 5, where packetloss starts.

I’ll try to open a ticket there and asking questions about rate-limiting on their network…

Topic		Replies	Views
Monitoring station routing problems Server operators	10	1004	July 13, 2019
Remove my server from pool Server operators monitoring	2	1107	November 2, 2019
Timeouts from San Jose Server operators	7	607	November 10, 2021
What's wrong with IPv6 in Austria	6	1273	May 23, 2017
Monitoring station seems to hate my server all of a sudden Server operators	35	4085	April 8, 2019

Problems with the Los Angeles IPv4 monitoring Station

Related topics