Beta system now has multiple monitors

beta
monitoring

#24

Hi,

the same goes for me. In the production pool everything is OK with my Servers but the beta systems i get e-mails, that my servers have a problem

https://web.beta.grundclock.com/user/urmvj7ct6ryi2dx6zmq

My restrict settings:
restrict default limited kod nomodify notrap nopeer noquery
restrict -6 default limited kod nomodify notrap nopeer noquery

I have no idea what is going on there


#25

Hi,

The “4 samples” monitor was getting “not in sync” responses it looks like, maybe it’s being rate limited? Which version of ntpd do you run? I wonder if the monitor is more aggressive than ‘ntpdate’ (that was what I was trying to emulate with the 4 requests, 2 seconds between each).

https://web.beta.grundclock.com/scores/78.46.60.40/log?limit=500&monitor=12


#26

Hi,

both Time-Servers are on Ubuntu 16.04 with ntpd Ver. 4.2.8p4.

After reviewing some online sources and this thread, i added the following to the configuration: discard average 2 minimum 1. Since then it seems to be ok. I think this was an issue with rate limiting.


#27

[ eh, I was quoting 4.1.1 docs – no wonder I was confused by this! So edited below ]

I put ‘limited’ in the default configuration suggestions on http://www.pool.ntp.org/join/configuration.html – so assuming that’s reasonable then the monitoring system should be working with that default configuration.

The system should be waiting 2 seconds between each query and the default “minimum” setting is that, so I don’t understand why it’s not working. @bhueske I haven’t taken time to read the tcpdump you sent properly. Can you summarize it for me (and everyone else)?

http://doc.ntp.org/current-stable/accopt.html


#28

Do the monitoring systems for the normal pool and beta pool query independently from the same IPs? If so, and a server is in both pools, and they both happened to query around the same time, it could be a problem. But that sounds quite rare.


#29

The “4 samples” monitor should be using a separate IP. The “normal” one shares an IP between the normal pool and the beta pool (currently).


#30

Yikes! I think the “Los Angeles (4 samples)” beta monitor was running old code that wasn’t waiting 2 seconds between each query. I fixed it just now. Hopefully that solves the unexpected errors. :-/ Sorry about that!


#31

It looks like there may still be some issues with the LA (4 samples) beta monitor (either that or I’m having inexplicable routing issues with only that monitoring node). Every other monitor in the beta system, plus all the ones on the main system are having no issues with my systems, but that LA (4 samples) beta monitor is consistently failing talking to my systems.