Monitors belgg1-19sfa9p and belgg2-19sfa9p having hiccups?

Hi @bas,

Could you kindly check your monitors belgg1-19sfa9p and belgg2-19sfa9p? Since a few days, they seem to very occasionally see unusually high offsets, sometimes larger than 1 second.

ts_epoch,ts,offset,step,score,monitor_id,monitor_name,leap,error
1730034823,2024-10-27 13:13:43,0.001598066,1,15.797745705,41,belgg2-19sfa9p,,
1730032413,2024-10-27 12:33:33,-0.315106736,-0.260426939,15.576574326,41,belgg2-19sfa9p,,
1730029998,2024-10-27 11:53:18,0.000873653,1,16.670526505,41,belgg2-19sfa9p,,
1730029686,2024-10-27 11:48:06,0.001589592,1,16.495292664,41,belgg2-19sfa9p,,
1730029234,2024-10-27 11:40:34,-1.213118746,-2,16.310832977,41,belgg2-19sfa9p,,
1730028873,2024-10-27 11:34:33,0.002238161,1,19.274560928,41,belgg2-19sfa9p,,

This affects all my servers across four locations in Germany and two in Singapore. It affects both IPv4 and IPv6. It also affects your own servers in multiple locations.

So I think it is very unlikely that this is a server-side issue, but rather points to a monitor-side one.

It is not a big issue from a functional point of view as it happens somewhat rarely only, and there obviously is a sufficient number of other monitors. But it badly skews the “Offset and scores” graphs.

It seems this roughly started around the time when belgg1-19sfa9p started monitoring IPv6 servers again. I obviously don’t know whether there is a causal connection, or whether the two things are completely unrelated.

Thanks!

Belg2 wasn’t supposed to work.

The problem is the router that is in use at the moment.
It can not handle the number of requests.

I did change it to a faster router but our monopolistic ISP won’t allow the faster router.
However, from 1 nov on they are forced to lift the restriction and I put the faster router back in place.

But it can’t affect your server, as timeouts only remove my testing-server from being used as test-server.

If you look:

https://www.ntppool.org/scores/87.118.104.17

and

https://www.ntppool.org/scores/2603:c020:8017:3e00::123

My server is taken OFF the scoring for your server. Whatever it produces, it will not be used to rate your scoring.

That is the beauty of the new scoring system, only active-monitors give scoring.
Any server, like mine at the moment, not scoring well all the time, is taken ‘offline’ for you.

Thanks for your reply!

If you reread my message more carefully, you’ll see that I wrote that it is not a functional issue for my or other servers, in the sense of affecting the scoring.

It is the graphs for many servers that are getting messed up to the point of being unusable.

Is there a way to disable your monitors until such time as the issue with your router has been addressed?

Yeah I read that, but you complain to me :rofl:

When in fact you must complain to @ask that he should stop including all servers in the graph.

As I can do nothing about it but take all monitors offline, but the issue isn’t constant.

It’s just the router, currenty a FritzBox 7530AX simply doesn’t cope ans slows down.
I tried to change it by taking the DNS-server out of it and put it on internal servers, but that doesn’t help.

It’s the firewall/nat tables that grow too big in my opinion.

Next Friday I can use the Fritzbox 5690Pro again, that is incredibly faster and doesn’t have this problem.

Last week my ISP, not the monopolistic one, did a test on VDSL and my router was set back to 8Mbit down and 512Kbit up, while it’s normal 100/35mbit.
But it will be fixed soon…then finally free modem-router choice is forced on the monopolistic ISP that controls the network.

Sorry my friend…not much I can do at the moment.

However, I did run over a month with the new router, but they noticed :face_with_symbols_over_mouth:

But still you responded only to the issue that I mentioned was a non-issue.

How about just stopping the monitoring processes on the respective monitor machines?

I can do that. No problem.

I have stoppend them now.

1 Like

Many thanks!

Good luck with the new old router then on Nov 1!

1 Like

Help me remember to turn them back on by then :rofl:

Yes, I will. Might not be on Nov 1 right away, depending on when I get around to it. But I made a note of this.

We will see…can’t wait to put the new router in place as it so much faster it’s not funny anymore.
The forced routers simply hinder internet performance.

And it’s simply not possible to install better…can’t wait till it’s 1st nov.

Belgium really ***** on these matters.

As the fiber guys here face the same problem, they are forced to use the Proximus ONT instead of their own.

As the state owns the monopolistic ISP Proximus…they do not do anything unless Europe forces them. Thank god the EU forces this upon us :ok_hand:

Operators in Germany in the past also tried to redefine the network termination point to be an operator-provided device, rather than the end of the physical line. But so far, luckily, the regulator has resisted such lobbying (certainly for copper-based access, but I think for optical fiber as well)…

BELGG2 has been largely a random number generator from here. The only harm done would be the forced re-scaling of the entire graph.

I’ve got some code here that stops that from happening in a dynamic graphing system, but it’s now almost 30 years old, so unlikely to be a significant contribution.

I am a bit unsure how to deal with this. The current graphing already has some offset-dependent scaling so that occasional “bigger” offsets don’t impact the visibility and differentiation of the “lesser” offsets - that hopefully by far exceed the number of the larger outliers - too much. But that obviously has its limits. So how to not completely exclude the large outliers, as they are interesting to see at a glance as well, but still keep the better resolution in the more interesting, lower offset areas?

Have you previously taken a look at the current approach, as documented in the GitHub repositories, to see whether you could build on that? I don’t understand enough of the topic and the code to contribute to this, and also Ask has previously signaled that the graphing is an area he prefers not to touch too much, given it is in a language that he doesn’t routinely deal with (if I understood correctly). But if there were specific proposals, I think he might
– generally – be open to that (which however doesn’t mean he’ll be able to actually spend time on this, given his other constraints and priorities vs. available timeslots, so it also might lead to nowhere after all, one never knows beforehand).

I should state that it would need to be a user-specified behaviour, i.e. the user can set some specified upper limit to the allowable rescale, and specify other modifications to the display. This potentially moves the rendering operation from the server to the client.

My own background is in data rendering for automotive engineers who have to deal with large volumes of multi-channel data. At an earlier time, there was the “eight brush” data recorder, which gave you miles of paper traces with up to 8 channels of data. When the engineers got a comma- or tab-delimited data file instead, the tool I gave them would let them see the data in an auto-scaled form, or with any arbitrary modifications to scale and graph position for any channel, as well as presenting the data in a tabular format. It was written for a VGA display on an MS-DOS platform when I was a contractor at Ford Motor Company back in the '80s/'90s. Code not NDA’d, so free to good home.

1 Like

Hi @Bas, as promised a kind reminder to re-enable and restart the monitor instance that you want to be running after your router switch.

Thanks, they are back online for testing.

If they give problems again, I will take them offline.

Lets see hoe it goes. Both are running IPv4/6

1 Like

Well, for some reason the modems go nuts over the monitors, it seems.
But also on other UDP traffic.

Ergo, I changed the setup again.

The VDSL2 modem is now a VMG4005-B50A in Bridge mode.
So the router doesn’t need to do DSL-stuff anymore.
Also the Zyxel protects against DDOS on my system, also a thing the router doesn’t need to do.

The router is now the Fritzbox 5690Pro, in Fiber-mode, it should be plenty fast.

I only run IPv4 at the moment, to see if it stays stable.

I did notice a reduced time to my system at other time-severs from 13ms to as little as 10ms.

That is a lot in my book over just a modem change.

If this keeps stable, I will add IPv6 to it, and see how it goes from there.

When problems arrise again, I will have my ISP check all streetcables from here to the ISP itself. They already offered this, but I declined as the Fritzbox 7530AX was stable but without monitors running.
So in my book that means it has to do something with CPU and DSL-chip communication inside the modem.

We will see…there are no other options left apart from taking cable, but they are very expensive, no static-ip and limited traffic.

Please let me know if the IPv4 monitor goes wrong, for EU-servers.

Hmm, out of curiosity, how much traffic is a single monitor creating, and what are the CPU, memory, and potentially disk requirements?

I always thought the monitors’ traffic is somewhat light in comparison to what an individual NTP server could potentially see, even when there’s some roughly 3000 servers (for IPv4) to be monitored.

E.g., considering such optimizations as only a subset of monitors polling a specific server at the highest rate (“active” monitors, and maybe a few beyond that), and others reducing their poll rate when in “testing” mode only for a specific server.

Bas,
good to hear you are making some progress on the fritz box issue.

As you may be aware i also have been struggling with a fritz box. They seem optimized for TCP traffic and deal badly with small packets such as UDP.

Best result i have reached with turning hardware packet acceleration off and increase packet size. YMMV

Don’t have experience with the 5690pro.
Please keep me informed on your findings.

It’s better to say “small packets such as NTP”. Unauthenticated NTP packets are 48 bytes, vs. 64 bytes for ping’s default ICMP echo requests. Having a busy NTP server port-forwarded behind a NAT is more challenging than most NAT scenarios due to the large number of “connections” (UDP is connectionless, but that’s nit picking). Most TCP and UDP packets are substantially larger, and there are many packets in bursts going through the same NAT mapping, while public NTP servers requires the NAT to set up mappings that are typically used for just one extra-small packet in each direction before the mapping times out (after 30s or so). So the amount of NAT work setting up and tearing down mappings for each kilobit of NTP server throughput is much higher than other types of traffic.