The “4 samples” monitor was getting “not in sync” responses it looks like, maybe it’s being rate limited? Which version of ntpd do you run? I wonder if the monitor is more aggressive than ‘ntpdate’ (that was what I was trying to emulate with the 4 requests, 2 seconds between each).
both Time-Servers are on Ubuntu 16.04 with ntpd Ver. 4.2.8p4.
After reviewing some online sources and this thread, i added the following to the configuration: discard average 2 minimum 1. Since then it seems to be ok. I think this was an issue with rate limiting.
[ eh, I was quoting 4.1.1 docs – no wonder I was confused by this! So edited below ]
I put ‘limited’ in the default configuration suggestions on http://www.pool.ntp.org/join/configuration.html – so assuming that’s reasonable then the monitoring system should be working with that default configuration.
The system should be waiting 2 seconds between each query and the default “minimum” setting is that, so I don’t understand why it’s not working. @bhueske I haven’t taken time to read the tcpdump you sent properly. Can you summarize it for me (and everyone else)?
Do the monitoring systems for the normal pool and beta pool query independently from the same IPs? If so, and a server is in both pools, and they both happened to query around the same time, it could be a problem. But that sounds quite rare.
Yikes! I think the “Los Angeles (4 samples)” beta monitor was running old code that wasn’t waiting 2 seconds between each query. I fixed it just now. Hopefully that solves the unexpected errors. :-/ Sorry about that!
It looks like there may still be some issues with the LA (4 samples) beta monitor (either that or I’m having inexplicable routing issues with only that monitoring node). Every other monitor in the beta system, plus all the ones on the main system are having no issues with my systems, but that LA (4 samples) beta monitor is consistently failing talking to my systems.
I think I still see the “4 samples” monitor sending 4 requests with no delay between them. I wanted to look at the code to see how it works. The original post says the old monitor written in perl was replaced, but there is no link to the new one .
Yikes! Indeed. I made the change in February, but either messed up pushing it to the server or restarting the right process. My workflow hadn’t caught up to two IPv4 monitors running on the same server, I think.
Thank you for pointing this out. I restarted it now and verified the behavior by staring at tcpdump for a bit.
I’m not sure if it’s intended, but I no longer see the monitoring host making any bursts. In tcpdump output I see about 6 requests per hour and it’s a mix of NTPv3 and NTPv4 requests.
I noticed that the graphs have different time scales, but even accounting for that, the scores from the new site are consistently lower. By the way, is there a means (like a HTTP GET variable) to sync up the x axes of the graphs between the two sites?