Beta system now has multiple monitors

beta
monitoring

#24

Hi,

the same goes for me. In the production pool everything is OK with my Servers but the beta systems i get e-mails, that my servers have a problem

https://web.beta.grundclock.com/user/urmvj7ct6ryi2dx6zmq

My restrict settings:
restrict default limited kod nomodify notrap nopeer noquery
restrict -6 default limited kod nomodify notrap nopeer noquery

I have no idea what is going on there


#25

Hi,

The “4 samples” monitor was getting “not in sync” responses it looks like, maybe it’s being rate limited? Which version of ntpd do you run? I wonder if the monitor is more aggressive than ‘ntpdate’ (that was what I was trying to emulate with the 4 requests, 2 seconds between each).

https://web.beta.grundclock.com/scores/78.46.60.40/log?limit=500&monitor=12


#26

Hi,

both Time-Servers are on Ubuntu 16.04 with ntpd Ver. 4.2.8p4.

After reviewing some online sources and this thread, i added the following to the configuration: discard average 2 minimum 1. Since then it seems to be ok. I think this was an issue with rate limiting.


#27

[ eh, I was quoting 4.1.1 docs – no wonder I was confused by this! So edited below ]

I put ‘limited’ in the default configuration suggestions on http://www.pool.ntp.org/join/configuration.html – so assuming that’s reasonable then the monitoring system should be working with that default configuration.

The system should be waiting 2 seconds between each query and the default “minimum” setting is that, so I don’t understand why it’s not working. @bhueske I haven’t taken time to read the tcpdump you sent properly. Can you summarize it for me (and everyone else)?

http://doc.ntp.org/current-stable/accopt.html


#28

Do the monitoring systems for the normal pool and beta pool query independently from the same IPs? If so, and a server is in both pools, and they both happened to query around the same time, it could be a problem. But that sounds quite rare.


#29

The “4 samples” monitor should be using a separate IP. The “normal” one shares an IP between the normal pool and the beta pool (currently).


#30

Yikes! I think the “Los Angeles (4 samples)” beta monitor was running old code that wasn’t waiting 2 seconds between each query. I fixed it just now. Hopefully that solves the unexpected errors. :-/ Sorry about that!


#31

It looks like there may still be some issues with the LA (4 samples) beta monitor (either that or I’m having inexplicable routing issues with only that monitoring node). Every other monitor in the beta system, plus all the ones on the main system are having no issues with my systems, but that LA (4 samples) beta monitor is consistently failing talking to my systems.


#32

I think I still see the “4 samples” monitor sending 4 requests with no delay between them. I wanted to look at the code to see how it works. The original post says the old monitor written in perl was replaced, but there is no link to the new one :slight_smile: .


#33

Yikes! Indeed. I made the change in February, but either messed up pushing it to the server or restarting the right process. My workflow hadn’t caught up to two IPv4 monitors running on the same server, I think.

Thank you for pointing this out. I restarted it now and verified the behavior by staring at tcpdump for a bit.

The code for the monitor is at https://github.com/ntppool/monitor


#34

Thanks for looking into it.

I’m not sure if it’s intended, but I no longer see the monitoring host making any bursts. In tcpdump output I see about 6 requests per hour and it’s a mix of NTPv3 and NTPv4 requests.


#35

Is the beta site intended to be more strict when it comes to handing out scores?

E.g.

http://www.pool.ntp.org/scores/72.14.181.128
https://web.beta.grundclock.com/scores/72.14.181.128

I noticed that the graphs have different time scales, but even accounting for that, the scores from the new site are consistently lower. By the way, is there a means (like a HTTP GET variable) to sync up the x axes of the graphs between the two sites?

Thanks!


#36

This topic was automatically closed 51 days after the last reply. New replies are no longer allowed.