Bjorn asked me to post my suggestion here as how monitoring should work to avoid false negatives:
So here it goes.
I checked the monitor URL for several ntp-servers, and it’s my believe there is a problem with the algorithm that determines the score.
When 1 monitor has a time-out the score is down immidiatly.
Even on the beta-server.
Instead you should code it different.
Start with monitor Newark = ok = score / if fail start next monitor / no score yet.
Then monitor LA = ok = score / if fail start next monitor / no score yet
Then monitor Amsterdam = ok = score / if fail => combine all 3 monitors and score is bad after each 3 tries.
However, average the best monitor results and dedicate the best monitor to the NTP server until it fails.
Not only do you have a good monitor-system but it will also show what monitor is flawed, as a bad monitor will hardly test any ntp-servers at all.
Also, make a score go bad only after e.g. 9 tests, if the 10th is still bad then and only then count all scores and show it.
At the moment just a few time-outs are enough to make your score negative, regardless if other monitors can reach you.
Just an idea, based on what I see, the score goes bad without a multiple check, 1 bad monitor and it goes down.
The reason for this is the beta-website monitor output, as you can see my NTP-server is fine but 1 monitor fails a lot of times, marking my server bad but that is simply wrong, see for yourself:
https://web.beta.grundclock.com/scores/77.109.90.72/log?limit=200&monitor=*
When 1 monitor has a time-out the score is bad, the other monitors report my server to be fine.
Sadly the official side only uses 1 monitor and that is Newark, and if you are unlucky like me, the score of your server is terrible and unfair.
As such I suggest a better algorithm should be in place as explained above.
Greetings Bas.