Monitoring Croatian servers problems since Thursday 27/12/2018

monitoring

#1

From 27/12 morning (around 8 UTC) the monitoring station on pool.ntp.org (LA) regularly says “i/o timeout” for all 5 of my servers in Croatia, Zagreb, at the Rudjer Boshkovich Institute.
This is specially problematic during UTC day hours, whereas during the night the score goes up. Presently all of the 5 servers are out, leaving Croatian pool with only 3.

Monitoring the servers I have no excess requests, congestions or similar. The servers are accessible without a problem from Croatia.

Furthermore, I cannot access web.beta.grundclock.com, it says “server not found” ???

I attach a screenshot of three servers

states from https://www.pool.ntp.org/user/zorko between 27/12, when it started, and 29/12. The problem still persists.

Otherwise, Happy New Year to everyone!


#2

traceroutes were okay, but running ntpdate against a couple of your servers half the time it said something like “no valid time”…Probably a network issue somewhere way out of your control.


#3

From my points traceroutes are not ok over icmp. MTR over TCP/80 ,but over icmp show drop on hop 13. Maybe there is ratelimit/buffer is full or network issue.

root@pve:~# mtr -r 161.53.131.81
Start: Mon Dec 31 09:49:44 2018
HOST: pve Loss% Snt Last Avg Best Wrst StDev
1.|-- 172.17.250.1 0.0% 10 0.6 0.7 0.4 1.1 0.0
2.|-- 87.120.166.1 0.0% 10 0.9 1.1 0.7 1.4 0.0
3.|-- 212.73.142.233.neterra.ne 40.0% 10 7.0 63.8 7.0 95.3 30.4
4.|-- xe-4-2-0-200.bar1.Sofia1. 0.0% 10 5.7 5.8 5.6 6.4 0.0
5.|-- ae-2-3203.ear3.Frankfurt1 90.0% 10 47.6 47.6 47.6 47.6 0.0
6.|-- Cogent-level3-200G.Frankf 0.0% 10 48.0 48.0 47.1 48.8 0.0
7.|-- be2846.ccr42.fra03.atlas. 0.0% 10 42.1 42.8 42.1 44.7 0.5
8.|-- be2960.ccr22.muc03.atlas. 0.0% 10 49.4 48.3 48.0 49.4 0.3
9.|-- be3462.ccr52.vie01.atlas. 10.0% 10 48.2 48.0 47.5 48.7 0.0
10.|-- be3465.rcr21.zag01.atlas. 0.0% 10 56.7 56.8 56.1 58.1 0.5
11.|-- te0-0-2-0.nr11.b020911-1. 0.0% 10 57.0 56.8 56.4 57.3 0.0
12.|-- 149.14.6.18 0.0% 10 56.4 57.1 56.4 57.8 0.0
13.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0

root@pve:~# mtr -Tr -P 80 161.53.131.81
Start: Mon Dec 31 09:44:49 2018
HOST: pve Loss% Snt Last Avg Best Wrst StDev
1.|-- 172.17.250.1 0.0% 10 0.7 0.7 0.5 1.0 0.0
2.|-- 87.120.166.1 0.0% 10 1.7 1.9 1.2 3.2 0.3
3.|-- 212.73.142.233.neterra.ne 0.0% 10 6.0 20.6 5.5 90.6 26.5
4.|-- xe-4-2-0-200.bar1.Sofia1. 0.0% 10 6.0 9.8 5.6 25.1 7.2
5.|-- ae-2-3203.ear3.Frankfurt1 50.0% 10 7338. 2718. 47.6 7338. 2994.3
6.|-- Cogent-level3-200G.Frankf 0.0% 10 48.3 48.1 47.6 49.7 0.5
7.|-- be2846.ccr42.fra03.atlas. 0.0% 10 42.6 43.2 42.5 44.4 0.3
8.|-- be2960.ccr22.muc03.atlas. 0.0% 10 48.8 48.5 47.8 49.6 0.3
9.|-- be2974.ccr51.vie01.atlas. 0.0% 10 47.9 48.4 47.9 50.5 0.7
10.|-- be2594.rcr21.zag01.atlas. 0.0% 10 56.3 56.8 56.1 58.2 0.5
11.|-- te0-0-2-0.nr11.b020911-1. 0.0% 10 56.8 57.2 56.6 57.8 0.0
12.|-- 149.14.6.18 0.0% 10 56.9 57.3 56.6 58.3 0.0
13.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0
14.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0
15.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0
16.|-- grgur.irb.hr 0.0% 10 63.8 180.5 62.5 1088. 321.8


#4

Thank you for your effort. It was something going on in the Internet, as littlejason99 said, somewhere way out of my control. Thank’s nikolay for your traceroutes. During that time I had actually quite weird traceroutes away from my servers, quite different from incoming routes.

The problem was obviously solved somewhere, seems just when people started working again after New Year… :slight_smile: Everything is normal from around 13:00 UTC Wednesday 2/1/2019.

However, and that is why I sincerely support the idea of geodistributed monitoring servers for the NTP pool, both from “Los Angeles, CA (4 samples)” and “Zurich (3 samples)” monitors I had during the whole time a pure score of 20!