"i/o timeout" from different monitoring stations

Likely someone at Zayo made a network change that had a negative impact. If the issue doesn’t resolve itself in the next few days you can try contacting Zayo tech support explaining the issue and sending that MTR report so they know which routers are causing issues.

Thank you elljay and littlejason - very much appreciated.

So I will sit on my hands and wait. Perhaps that was the best move, but after a year never having had this problem, I was concerned - it has always been rock solid. Since early Wednesday until today didn’t strike me as something that was going to go away. Latest Stats - production:

NTP 19Jan2020-3

And Beta:

I will revert back tomorrow, hopefully to say case closed.

Once again, many thanks,

Dave.

1 Like

Thank you for helping test and diagnose! (And of course for having a server in the pool) :slight_smile:

Still ‘sitting on my hands’ and was about to post that things are resolved. Sever has been in the pool since 10:00 UTC, but will likely drop out in a moment. However, best performance since last Tuesday Wednesday.

Here are the stats, Beta:

NTP 20Jan2020-Beta 4

And production:

NTP 20Jan2020-4

Back to sitting on my hands, I’ll update if anything remarkable happens, and any advice is of course welcome.

Many thanks,

Dave.

Hi Dave, my advice would be:

  • check that your server is working as you expect and that it can be accessed from the internet (Google a “check my ntp” site)
  • run mtr to the monitor and if there’s something obvious report it to your ISP to sort out.
  • if your score is >10 ignore it :slight_smile:
  • if it’s intermittent and it’s bugging you, I would work out when the checks from the monitor are due (you can watch them come in with the appropriate tcpdump recipe), then fire up mtr around the same time and see if the monitor packets arrive / gets an answer / there’s an obvious drop along the route at that time.

Thank you elljay - advice much appreciated.

Everything seems to have settled down now - ‘sitting on my hands’ seems to have worked.

Production stats:

NTP 24Jan2020

Beta stats:

NTP 24Jan2020 Beta

NTP Server Stats, which nicely shows the active pool time since rebooting (everything):

So for me it is case closed.

Many thanks everyone for all the very helpful advice and pointers.

All the best,

Dave.

1 Like

“Internet pipe cleaning complete. Normal service will now resume.” :grinning:

:tea:

Right, end-to-end is what matters. Loss at intermediate hops is irrelevant if you have no loss at the endpoint. So your output looks fine to me.

Hi, All!

I want to say: this gap (graph from posts above) looks like traffic route in transit between monitoring and ntp server is changed.
Permanent offset in many cases arises from delays in asymmetric routing/delay. So, when offset was ± constant and then changes abruptly to some ms and stay, it’s very likely that network routing changes in transit and affect to delays. Check you traceroute to pool and reverse (debug link is posted some posts earlier)
offset_gap

And this is a good news about monitoring changes and development from Ask :slight_smile:

Jumps like that can also be when your NTP program decides to choose a different source as its primary.

really there are many reasons. If you run stratum1 server and “do nothing changes” then it looks like change in route delays.