I am one of the few (many?) affected by the monitoring system (both prod and beta) giving a perfectly good pool server a low score. Traceroutes show what look like assymetrical routing to/from the monitoring server.
If I’m interpreting correctly, as7012.net is routing outbound traffic through ntt.net (which other threads have identified as the problem), but inbound traffic comes through cogentco.com (at least from my colo). Is this as big a problem as I think it is? Can/should phyber.com do anything about this (e.g. route away from as7012.net, or ask them to investigate)?
Or, is this really a problem with Infolink that just looks like a good thing since we aren’t happy with ntt.net right now?
Traceroute to 64.251.10.152 (to my server from curl -s https://trace.ntppool.org/traceroute/64.251.10.152)
1 gw-b.develooper.com (207.171.7.3) AS7012 1.141 1.086
2 gi1-9.r01.lax2.phyber.com (207.171.30.13) AS7012 1.080 1.063
3 te0-1-0-7.r04.lax02.as7012.net (207.171.30.61) AS7012 1.020 0.984
4 xe-0-1-0-30.r01.lsanca07.us.bb.gin.ntt.net (198.172.90.73) AS2914 0.887 0.840
5 ae-19.r00.lsanca07.us.bb.gin.ntt.net (129.250.3.235) AS2914 0.811 1.177
6 * *
7 * *
8 INFOLINK-GL.ear2.Miami1.Level3.net (4.59.90.90) AS3356 61.286 61.268
9 (64.251.1.34) AS15083 61.729 61.358
10 ge2-edge.mia.infolink.com (64.251.0.150) AS15083 61.798 61.750
11 * *
traceroute to 207.171.3.5 (monitoring server)
1 1-30-251-64.serverpronto.com (64.251.30.1) 0.549 ms 0.597 ms 0.668 ms
2 ge2-edge.mia.infolink.com (64.251.0.149) 0.411 ms 0.445 ms 0.460 ms
3 64.251.1.33 (64.251.1.33) 0.361 ms 0.376 ms 0.391 ms
4 te0-0-1-0.nr11.b015452-0.mia01.atlas.cogentco.com (38.104.90.49) 1.241 ms 1.308 ms 1.371 ms
5 te0-0-1-1.agr12.mia01.atlas.cogentco.com (154.24.31.61) 1.045 ms te0-0-1-1.agr11.mia01.atlas.cogentco.com (154.24.31.57) 1.030 ms te0-0-1-1.agr12.mia01.atlas.cogentco.com (154.24.31.61) 1.078 ms
6 te0-4-0-0.ccr22.mia01.atlas.cogentco.com (154.54.1.169) 1.058 ms te0-4-1-0.ccr21.mia01.atlas.cogentco.com (66.28.4.217) 1.020 ms te0-4-0-0.ccr22.mia01.atlas.cogentco.com (154.54.1.169) 0.904 ms
7 be3570.ccr42.iah01.atlas.cogentco.com (154.54.84.1) 30.721 ms 30.733 ms be3569.ccr41.iah01.atlas.cogentco.com (154.54.82.241) 30.801 ms
8 be2928.ccr21.elp01.atlas.cogentco.com (154.54.30.162) 61.366 ms be2927.ccr21.elp01.atlas.cogentco.com (154.54.29.222) 60.892 ms be2928.ccr21.elp01.atlas.cogentco.com (154.54.30.162) 61.662 ms
9 be2930.ccr32.phx01.atlas.cogentco.com (154.54.42.77) 61.639 ms 61.662 ms be2929.ccr31.phx01.atlas.cogentco.com (154.54.42.65) 60.912 ms
10 be2932.ccr42.lax01.atlas.cogentco.com (154.54.45.162) 61.381 ms 61.617 ms be2931.ccr41.lax01.atlas.cogentco.com (154.54.44.86) 61.557 ms
11 be3271.ccr41.lax04.atlas.cogentco.com (154.54.42.102) 61.297 ms be3360.ccr41.lax04.atlas.cogentco.com (154.54.25.150) 61.686 ms be3271.ccr41.lax04.atlas.cogentco.com (154.54.42.102) 61.124 ms
12 te0-1-0-0.410.r04.lax02.as7012.net (38.88.197.82) 61.500 ms 61.414 ms 61.494 ms
13 te7-4.r02.lax2.phyber.com (207.171.30.62) 61.354 ms 61.404 ms 61.467 ms
14 * * *
Asymetrical routing is more or less the default on the internet
Reason for this is, how the routers are calculating their best pathes to each other. And maybe from the point of view of your site, Cogent is the best path - from the point of view of AS7012 it’s Level3.
There’s nothing wrong about it.
This is normal and expected even though it may not be the best. As long as people know what they are doing and don’t have some form of RP filter, it is fine.
Actually, this was useful. I managed to make a test showing that querying IPs on networks behind NTT sometimes fails where querying a network that as7012 (Phyber) is peering with never fails. I’m talking to them about it (we have before but never figured out this pattern; right now it looks really obvious).
If infolink or serverpronto has a contract with Cogent as their main / preferred carrier it’s going to ride Cogent as much as possible on their network till it has to switch off near the destination. Whereas on the other end Phyber might have ntt as their preferred carrier…
It’s all about least-cost routing, available bandwidth, and whom is contracted with whom…
It still can be depending on the context, but once you’re crossing AS boundaries it’s pretty normal and generally not cause for concern. It’s when it’s happening on a link that’s entirely within one datacenter that you want to be worried (for example, if it was happening between nodes in your ISP’s network, it’s probably misconfiguration).
As of Thursday, Feb 14 at 17:00 or so, something has improved dramatically. My server’s score has returned (19.9 at time of this post), with what looks like zero dropped monitoring queries since then.
Thanks for the explanations to this old-timer. And @ask, I assume this was your doing. Thank you!
Crossing AS boundaries it’s exceedingly common to see asymmetric paths. Most large carriers adopt so-called “hot potato” routing. For example, let’s say that something with CenturyLink (Level3) transit in the EU is trying to communicate with something with Cogent transit in the USA.
packets coming from the thing in Europe will go to CenturyLink
CenturyLink will get them to Cogent as soon as they possible can (i.e. somewhere in Europe), passing them like a “hot potato” as quickly as possible to the next network to deal with
Cogent will transport the packets across the Big Pond
Cogent will deliver the packets to the thing in the USA
Coming back the other way:
packets will go from the thing in the USA to Cogent
Cogent will want to get those packets off Cogent’s network as soon as possible, to CenturyLink in the USA
CenturyLink will have to transport the packets from USA to EU over CenturyLink’s transatlantic fibres
CenturyLink will then deliver the packets to the EU-based thing
That gives you two very asymmetric paths, potentially crossing the Atlantic west-east on CenturyLink’s submarine assets; and crossing east-west on Cogent’s.
In reality this might end up using the same submarine optical system, as different wavelengths on the same fibre, or on different fibres within the same cable. But it will be different carriers’ routing equipment at each end, with different IPs, and so you’ll see very different traceroutes.