Problem with Monitor...underway UDP is lost at packet-dot-net

Hi there,

I have been searching for days to find the problem of me getting bad scores.
I was unable to find the cause, no matter what I do or did, it never got better.

So I started tracing to the monitor station, and behold, underway between Zayo . com and Packet . net all packets are dropped.

It’s my believe that Zayo.com has problems, could be wrong.
Maybe other can track as well?

This can’t be bad, is it?

root@server:/# chronyc tracking
Reference ID : 47505300 (GPS)
Stratum : 1
Ref time (UTC) : Fri Oct 11 13:25:44 2019
System time : 0.000012358 seconds fast of NTP time
Last offset : +0.000012565 seconds
RMS offset : 0.000442996 seconds
Frequency : 0.440 ppm fast
Residual freq : -0.177 ppm
Skew : 31.578 ppm
Root delay : 0.000000 seconds
Root dispersion : 0.001194 seconds
Update interval : 16.0 seconds
Leap status : Normal
root@server:/#

Seems to me the time is pretty much spot on.

Thanks,
Bas.

Not spam.

Here is the tracking:

Start: Fri Oct 11 15:33:00 2019
HOST: server Loss% Snt Last Avg Best Wrst StDev
1.|-- fritz.box 0.0% 10 0.5 0.6 0.3 3.1 0.7
2.|-- bras01.bxl.be.edpnet.net 0.0% 10 8.3 9.1 7.9 13.9 1.6
3.|-- router01.bruix.be.edpnet. 0.0% 10 9.3 9.1 8.6 9.8 0.0
4.|-- router02.bruix.be.edpnet. 0.0% 10 13.5 18.8 13.5 30.0 5.5
5.|-- router01.adamtel.nl.edpne 0.0% 10 21.5 22.0 21.0 24.4 0.9
6.|-- router01.frank.de.edpnet. 0.0% 10 21.5 21.1 20.6 21.5 0.0
7.|-- xe-1-2-0.mpr1.fra4.de.abo 0.0% 10 21.2 24.0 20.9 38.2 5.4
8.|-- ae8.mpr1.fra3.de.zip.zayo 0.0% 10 21.8 21.5 20.9 22.3 0.0
9.|-- ae27.cs1.fra6.de.eth.zayo 0.0% 10 97.8 101.2 90.6 109.8 6.4
10.|-- ae2.cs1.ams17.nl.eth.zayo 0.0% 10 90.5 91.2 90.5 93.2 0.6
11.|-- ae0.cs1.ams10.nl.eth.zayo 0.0% 10 91.0 93.0 90.6 111.9 6.6
12.|-- ae2.cs1.lhr15.uk.eth.zayo 0.0% 10 91.0 91.6 90.7 93.3 0.7
13.|-- ae0.cs1.lhr11.uk.eth.zayo 0.0% 10 91.4 92.1 91.1 95.7 1.2
14.|-- ae5.cs1.lga5.us.eth.zayo. 10.0% 10 91.3 91.4 90.7 92.9 0.6
15.|-- ae15.er1.lga5.us.zip.zayo 0.0% 10 90.2 92.4 90.2 98.3 2.4
16.|-- 64.125.54.26 0.0% 10 90.6 94.2 90.5 122.7 10.0
17.|-- 39.ae32.bbr2.ewr1.packet. 0.0% 10 91.3 91.9 91.2 96.1 1.3
18.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0
19.|-- 147.75.98.105 0.0% 10 108.1 108.0 103.3 112.6 3.1
20.|-- monewr1.ntppool.net 0.0% 10 92.1 94.7 91.5 107.5 5.5
root@server:~#

Not spam.

Very sure packet has problems, as they talk about IPv4 problems at EWR1…after it’s 100% loss.

On their status pages: https://status.packet.com/history

It states:

EWR1: IPv4 Address Inventory Depleted

This incident has been resolved.

Oct 1, 19:52 - 20:21 UTC

Same server pops up…can somebody contact these guys and tell them it’s not resolved?

Thanks.

Not spam.

Hi Bas, sorry to hear you’re having problems. The best thing to do is log a ticket with your ISP so they can investigate and contact the peer and ask them to resolve / change their routing.

All of us that pass that server have problems.
The server is in the USA near the monitoring station, way after peering near my ISP.
So contacting them is of no use.
Your data-center should contact them, as they are probably a peer to them.
Opening a ticket here will do nothing, you need to contact packet-dot-net and tell them it’s not resolved at EWR1

The loss is there.

Why is this handled as spam?

Hi Bas, only Ask has the relationship with the provider at the monitor end. He is unlikely to respond for some time, so the only options are to ignore the problem and wait for it to improve on it’s own or for you to contact your ISP as you have the relationship with them. Other people have contacted their providers and they have happily looked at and resolved routing issues between their ends and the monitoring server.

Sorry I can’t be of more help :slightly_frowning_face: but you have to remember that the pool is a free service run by volunteers. :slight_smile: :+1:

Hi Elljay,

I understand that it’s all volunteers, I’m one too :slight_smile:
However, many of us get spammed with automated-messages being removed from the pool.
I’m investigating the matter for more then 2 weeks now, tested everything possible.
And it turned out to be packet-dot-net that hasn’t fixed their faulty, server, as I posted before.

I’m running 2 servers, and every time we hit EWR1 the packages are lost 100%.

I know what my ISP will tell me, that they have no relation with that peer and as such it’s not their problem.

However, I will contact packet myself and tell them about the problem, maybe it helps.

Thanks anyway,

Bas.

Yeay for volunteers! :grinning: :+1:

I agree it’s annoying - I was just writing a proposed update to the alert email that gets sent to add some troubleshooting steps as we seem to get be getting quite a few queries recently!

I think I would push the ISP harder - you pay them to deliver traffic and they’re not doing it properly! :slight_smile: Having set up our own servers we probably have more knowledge than those who haven’t - I don’t think ISPs could expect most people to be able to diagnose where routing isn’t working or expect people to contact that peer directly! The routing is within the ISP/Peers control not ourselves. As I say we’ve had reports back from other people whose ISPs have contacted peers and got things fixed. Good luck! :slight_smile:

Hi Elljay,

Well I used to be a trouble-shooting-engineer and a webhoster myself.
Doing this kind of stuff is fun!
Still hosting but just for friends, not to get paid.

I just got word from packet and they are going to contact NTPpool to sort it out.
Nice!

Bas.

Nice one! Hope they find the root cause! :+1:

Look what happened, they added to extra IP’s to find the cause:

17.|-- 39.ae32.bbr2.ewr1.packet. 0.0% 10 91.5 92.0 91.4 94.1 0.6
18.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0
19.|-- 147.75.98.107 0.0% 10 105.2 118.5 104.4 189.0 25.1

and

17.|-- 39.ae32.bbr2.ewr1.packet. 0.0% 10 134.2 96.1 91.6 134.2 13.4
18.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0
19.|-- 147.75.98.105 0.0% 10 104.1 105.5 102.9 113.8 3.3

And then after a few “pings” 2 other servers popup:

dsr1 + dsr2 both .ewr1 as extra…and 147.75.98.105 / 107 …

Previous runs didn’t show those…UDP still lost, but 95% and not 100% anymore.
Looks like some monitor of their own.

Bas.

In the meantime I got messages from Packet and NTPpool.

Let’s hope it’s sorted soon.

As it’s a major issue for many, including me…we are on life-support (-5 ~ -30) right now…but the next automated message will flat-line us :cold_sweat:

I for one am not ready to give up! :grinning:

We are getting somewhere, there is a problem with monitoring.
As they asked me to use the beta-server as well.
So I did.

On the normal server ntp1.heppen.be is rubbish, but on the beta-server no issues at all, it’s near 20.
But my seconds, ntp2.heppen.be is perfect on the normal server but rubbish on the beta-server, score about 5.6.

As the beta server is in Amsterdam and not in the USA it has a different path.

In my optinion the monitor can’t be trusted if you have to follow a path that drops UDP-messages.

The uptime-testing should be more robust and not assume your server is bad just because the path to you is broken.

If you have troubles with the blue-line and thus bad scores, it is suggested you enter your server here:

http://web.beta.grundclock.com/

Then wait an hour or so and see if your scores are ok. My “bad” server is shown ok there, so the monitor has a problem.

Look at the scores on different monitors:

Normal:

Beta-server:

Greetings Bas.

Since nobody has mentioned it yet, I’ll point out that the line
18.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0
is meaningless. It doesn’t mean that the node is dropping packets, it means that the node is not responding to traceroute (which it is not required to do).

It’s clear that this is a non-issue because the subsequent lines in the trace show 0% loss. In general, people assign far too much value to traceroute. The best you can hope for is a series of nodes dropping packets, in which case you may be able to guess that the problems start at the first of those nodes. Without a trace from the other end, it’s a fairly weak guess because on the modern internet you’re very likely to have a completely different route in the other direction.

One thing that might be nice to add in future would be the ability to see a trace from the monitoring station.

MTR was used…the monitor is behind EWR1 of packet.net and it fails there all the time.

All of this crap started when NTP-pool moved to packet.net, before there where no problems.

I have been running NTP-servers for a long time, never ever got an email of being bad.

Then it moved to Packet and I get at least 1 a day!

There is.
http://trace.ntppool.org/traceroute/[your ip address goes here]

halfway there. lacks packet loss stats.