Server monitoring

What happens with the monitoring out of San Jose? Is it only me?

The monitoring out of Amsterdam seems fine though.

Sudden jump in offset and another jump backward normally signifies changes in the routing between your NTP server and the monitoring station.

Some additional data to support NTPman’s response. This plot shows additional delay information seen from the San Jose monitor


The top plot shows the one-way delays (this assumes that both hosts have accurate time).
The middle plot shows the computed offset, similar to that shown in the server monitoring plot.
The bottom plot shows the round-trip time which is insensitive to the two host clocks.

The NTP response delay (your server → San Jose) decreased briefly. The impact on NTP offset seen in SanJose is only 15 msec, which isn’t impacting scores. If you want to probe deeper, run traceroutes from your NTP server towards 2604:1380:1001:d600::1, San Jose’s IPv6 address.

2 Likes

It’s not only you, it happens to all of us. The monitor(s) have poor/overloaded peers tot the datacenter(s) and that causes timeouts and as such drops in the graph.
The monitors are not stable.

We all complained about them for years.

Ok, but what’s going on with the sampling period? It used to be 1024s, but now it seems to be variable?

ts_epoch,ts,offset,step,score,monitor_id,monitor_name,leap,error
1633226768,"2021-10-03 02:06:08",-0.01438497,1,20,10,"San Jose, CA, US",0,
1633226768,"2021-10-03 02:06:08",-0.01438497,1,20,,,0,
1633224895,"2021-10-03 01:34:55",-0.013908964,1,20,10,"San Jose, CA, US",0,
1633224895,"2021-10-03 01:34:55",-0.013908964,1,20,,,0,
1633219448,"2021-10-03 00:04:08",-0.014651828,1,20,10,"San Jose, CA, US",0,
1633219448,"2021-10-03 00:04:08",-0.014651828,1,20,,,0,
1633216445,"2021-10-02 23:14:05",-0.014050207,1,20,10,"San Jose, CA, US",0,
1633216445,"2021-10-02 23:14:05",-0.014050207,1,20,,,0,
1633214859,"2021-10-02 22:47:39",-0.015255415,1,20,10,"San Jose, CA, US",0,
1633214859,"2021-10-02 22:47:39",-0.015255415,1,20,,,0,
1633212638,"2021-10-02 22:10:38",-0.014175246,1,20,10,"San Jose, CA, US",0,
1633212638,"2021-10-02 22:10:38",-0.014175246,1,20,,,0,

This seems to apply to IPv4 much more than to IPv6 and in Europe more than elsewhere. Last night it was bad again overhere, to the extend where it starts posing a risk for the stability of the pool, because all of a sudden quite a few servers might be dropped out of the pool.

It seems that the beta with multiple vantage points has fewer problems.

Schermafbeelding 2021-10-16 om 10.27.12

Schermafbeelding 2021-10-16 om 10.27.26

The monitor is crap, has been for years.
I’m done.

All servers deleted again.

It will never be resolved.

Ask doesn’t do anything to fix his monitor.

The monitor is the Achilles heel of the pool. I’d love to help, if only @ask asks for it.

1 Like

It’s sad. Since the San Jose monitor is used it went worse for me as well and the Newark monitor was already bad.

The beta system is doing much better and also has an EU monitoring station implemented. Sadly it looks like it will be beta forever. No idea why all servers still get monitored from an US machine, while the most pool machines are based in Europe.

1 Like

I have written an entire paper on how the monitor should work, this was a few years ago.
Nothing happened.

Loads of good servers are being de-listed only because the monitor fails to reach them.
I have spend a lot of time finding the problem and it lead to the monitor all the time.

My conclusion is that @ask has no interest in this project and ignores all input on improving it.

There are a lot of people here that want to fix the monitor issue, but ‘management’ doesn’t care one bit.

The pool is heading nowehere as there is no intention to fix issues.
This is ongoing for years and years.