Holy assymetrical routes, Batman! Is this an as7012.net routing issue?

I am one of the few (many?) affected by the monitoring system (both prod and beta) giving a perfectly good pool server a low score. Traceroutes show what look like assymetrical routing to/from the monitoring server.

If I’m interpreting correctly, as7012.net is routing outbound traffic through ntt.net (which other threads have identified as the problem), but inbound traffic comes through cogentco.com (at least from my colo). Is this as big a problem as I think it is? Can/should phyber.com do anything about this (e.g. route away from as7012.net, or ask them to investigate)?

Or, is this really a problem with Infolink that just looks like a good thing since we aren’t happy with ntt.net right now?

Traceroute to 64.251.10.152 (to my server from curl -s https://trace.ntppool.org/traceroute/64.251.10.152)
1 gw-b.develooper.com (207.171.7.3) AS7012  1.141  1.086
2 gi1-9.r01.lax2.phyber.com (207.171.30.13) AS7012  1.080  1.063
3 te0-1-0-7.r04.lax02.as7012.net (207.171.30.61) AS7012  1.020  0.984
4 xe-0-1-0-30.r01.lsanca07.us.bb.gin.ntt.net (198.172.90.73) AS2914  0.887  0.840
5 ae-19.r00.lsanca07.us.bb.gin.ntt.net (129.250.3.235) AS2914  0.811  1.177
6  *  *
7  *  *
8 INFOLINK-GL.ear2.Miami1.Level3.net (4.59.90.90) AS3356  61.286  61.268
9 (64.251.1.34) AS15083  61.729  61.358
10 ge2-edge.mia.infolink.com (64.251.0.150) AS15083  61.798  61.750
11  *  *


traceroute to 207.171.3.5 (monitoring server)
 1  1-30-251-64.serverpronto.com (64.251.30.1)  0.549 ms  0.597 ms  0.668 ms
 2  ge2-edge.mia.infolink.com (64.251.0.149)  0.411 ms  0.445 ms  0.460 ms
 3  64.251.1.33 (64.251.1.33)  0.361 ms  0.376 ms  0.391 ms
 4  te0-0-1-0.nr11.b015452-0.mia01.atlas.cogentco.com (38.104.90.49)  1.241 ms  1.308 ms  1.371 ms
 5  te0-0-1-1.agr12.mia01.atlas.cogentco.com (154.24.31.61)  1.045 ms te0-0-1-1.agr11.mia01.atlas.cogentco.com (154.24.31.57)  1.030 ms te0-0-1-1.agr12.mia01.atlas.cogentco.com (154.24.31.61)  1.078 ms
 6  te0-4-0-0.ccr22.mia01.atlas.cogentco.com (154.54.1.169)  1.058 ms te0-4-1-0.ccr21.mia01.atlas.cogentco.com (66.28.4.217)  1.020 ms te0-4-0-0.ccr22.mia01.atlas.cogentco.com (154.54.1.169)  0.904 ms
 7  be3570.ccr42.iah01.atlas.cogentco.com (154.54.84.1)  30.721 ms  30.733 ms be3569.ccr41.iah01.atlas.cogentco.com (154.54.82.241)  30.801 ms
 8  be2928.ccr21.elp01.atlas.cogentco.com (154.54.30.162)  61.366 ms be2927.ccr21.elp01.atlas.cogentco.com (154.54.29.222)  60.892 ms be2928.ccr21.elp01.atlas.cogentco.com (154.54.30.162)  61.662 ms
 9  be2930.ccr32.phx01.atlas.cogentco.com (154.54.42.77)  61.639 ms  61.662 ms be2929.ccr31.phx01.atlas.cogentco.com (154.54.42.65)  60.912 ms
10  be2932.ccr42.lax01.atlas.cogentco.com (154.54.45.162)  61.381 ms  61.617 ms be2931.ccr41.lax01.atlas.cogentco.com (154.54.44.86)  61.557 ms
11  be3271.ccr41.lax04.atlas.cogentco.com (154.54.42.102)  61.297 ms be3360.ccr41.lax04.atlas.cogentco.com (154.54.25.150)  61.686 ms be3271.ccr41.lax04.atlas.cogentco.com (154.54.42.102)  61.124 ms
12  te0-1-0-0.410.r04.lax02.as7012.net (38.88.197.82)  61.500 ms  61.414 ms  61.494 ms
13  te7-4.r02.lax2.phyber.com (207.171.30.62)  61.354 ms  61.404 ms  61.467 ms
14  * * *

Asymetrical routing is more or less the default on the internet :wink:
Reason for this is, how the routers are calculating their best pathes to each other. And maybe from the point of view of your site, Cogent is the best path - from the point of view of AS7012 it’s Level3.
There’s nothing wrong about it.

This is normal and expected even though it may not be the best. As long as people know what they are doing and don’t have some form of RP filter, it is fine.

Actually, this was useful. I managed to make a test showing that querying IPs on networks behind NTT sometimes fails where querying a network that as7012 (Phyber) is peering with never fails. I’m talking to them about it (we have before but never figured out this pattern; right now it looks really obvious).

I guess I’m old-school. We used to think that was a mis-configuration… LOL Thanks for replying!

If infolink or serverpronto has a contract with Cogent as their main / preferred carrier it’s going to ride Cogent as much as possible on their network till it has to switch off near the destination. Whereas on the other end Phyber might have ntt as their preferred carrier…

It’s all about least-cost routing, available bandwidth, and whom is contracted with whom…

It still can be depending on the context, but once you’re crossing AS boundaries it’s pretty normal and generally not cause for concern. It’s when it’s happening on a link that’s entirely within one datacenter that you want to be worried (for example, if it was happening between nodes in your ISP’s network, it’s probably misconfiguration).

As of Thursday, Feb 14 at 17:00 or so, something has improved dramatically. My server’s score has returned (19.9 at time of this post), with what looks like zero dropped monitoring queries since then.

Thanks for the explanations to this old-timer. And @ask, I assume this was your doing. Thank you!

Crossing AS boundaries it’s exceedingly common to see asymmetric paths. Most large carriers adopt so-called “hot potato” routing. For example, let’s say that something with CenturyLink (Level3) transit in the EU is trying to communicate with something with Cogent transit in the USA.

  • packets coming from the thing in Europe will go to CenturyLink
  • CenturyLink will get them to Cogent as soon as they possible can (i.e. somewhere in Europe), passing them like a “hot potato” as quickly as possible to the next network to deal with
  • Cogent will transport the packets across the Big Pond
  • Cogent will deliver the packets to the thing in the USA

Coming back the other way:

  • packets will go from the thing in the USA to Cogent
  • Cogent will want to get those packets off Cogent’s network as soon as possible, to CenturyLink in the USA
  • CenturyLink will have to transport the packets from USA to EU over CenturyLink’s transatlantic fibres
  • CenturyLink will then deliver the packets to the EU-based thing

That gives you two very asymmetric paths, potentially crossing the Atlantic west-east on CenturyLink’s submarine assets; and crossing east-west on Cogent’s.

In reality this might end up using the same submarine optical system, as different wavelengths on the same fibre, or on different fibres within the same cable. But it will be different carriers’ routing equipment at each end, with different IPs, and so you’ll see very different traceroutes.

1 Like

In case this information is useful in working out some of the issues, this is what we get when tracing from the ntp servers and from our servers back.

Traceroute to 103.51.68.133
1 . (207.171.7.3) AS7012 0.213 0.172
2 . (207.171.30.13) AS7012 0.507 0.527
3 . (207.171.30.61) AS7012 0.982 0.972
4 . (206.72.210.122) 0.726 0.715
5 . (64.62.151.126) AS6939 0.979 1.149
6 . (49.255.255.14) AS4826 205.031 205.028
7 . (49.255.255.3) AS4826 198.943 198.937
8 . (49.255.255.13) AS4826 200.035 196.014
9 . (49.255.255.11) AS4826 208.272 208.914
10 . (114.31.206.129) AS4826 209.639 209.605
11 (114.31.206.43) AS4826 195.442
11 . (114.31.206.129) AS4826 208.913
12 (175.45.100.202) AS4826 209.355
12 . (114.31.206.43) AS4826 202.907
13 (175.45.100.202) AS4826 205.288
13 . (203.161.65.62) AS4826 209.221
14 . (203.161.65.62) AS9822 208.346 209.221
15 . (203.161.65.94) AS9822 208.841 209.900
16 . (203.161.94.2) AS9822 * 207.891
17 . (103.51.68.133) AS134076 213.333 209.129

traceroute to 207.171.7.3 (207.171.7.3), 30 hops max, 60 byte packets
1 . (103.51.68.1) 0.099 ms 0.111 ms 0.096 ms
2 * * *
3 * * *
4 . (203.161.65.93) 1.162 ms 1.237 ms 1.264 ms
5 . (203.161.65.65) 1.202 ms 1.272 ms 1.297 ms
6 * * *
7 * * *
8 * * *
9 * * *
10 * * *
11 * * *
12 * * *
13 * * *
14 . (207.171.30.62) 213.142 ms 210.623 ms 205.773 ms
15 * * *
16 * * *
17 * * *
18 * * *
19 * * *
20 * * *
21 * * *
22 * * *
23 * * *
24 * * *
25 * * *
26 * * *
27 * * *
28 * * *
29 * * *
30 * * *

note: I had to replace all the dns names with fullstops as I am a new member and it wont let more than 2 urls through.

Shane