Ohh monitor overview now with statistics
Yes! beta.4 has been released.
I also fixed the bug where a mix of timeouts and good responses were counted as a timeout that @avij noticed; and some confusing log messages when the API service was down (also noticed by @avij). The client now waits until both IPv4 and IPv6 IPs have been recorded before showing the user the registration URL.
The website also has options to configure the account limits for monitors (myself and @apuls have access to change them).
Mostly the fixes in beta 4 were for the āselectorā thatās adjusting which monitors are active for each server. Itās now filtering out so servers from the same account as the monitor wonāt be associated, we donāt have multiple monitors on the same network monitoring a server, and generally limits how many monitors from one account might monitor a server.
Are there new FreeBSD tar files? Is there a way to find them without asking?
I thought the previous timeout / good response behaviour was a feature, not a bug
After this change the monitored NTP servers are allowed to have up to 75% packet loss (or 66% depending on configuration) without a penalty. Iām not sure if thatās a good thing. Maybe the new score could be calculated from a combination of the good responses and the number of packets lost.
https://www.ntppool.org/ down? I get a 503.
@john1 without asking: only if you can figure out the build number, I think.
The latest amd64 release build is at https://builds.ntppool.dev/ntppool-agent/builds/release/506/ntppool-agent_4.0.0-beta.4_freebsd_amd64.tar.gz
There are other URLs here:
https://builds.ntppool.dev/ntppool-agent/builds/release/506/checksums.txt
Iāll add directory indexes in the web server for the release path so itās easier to discover (now done: Index of /ntppool-agent/builds/release )
Yeah, Iām not sure itās a good thing either! But since it was a bug, I decided to fix it. (The behavior was inconsistent depending on which NTP queries got lost).
I appreciate the point made earlier in the thread that the new monitoring system is less likely to penalize servers that just happen to have a bumpy connection to an individual monitoring server. Some of the current design choices came from my frustration with all the threads complaining about the monitoring system being unreliable when it was actually just flagging those problematic connections.
One reason we do multiple queries is to detect servers with overly aggressive rate limits.
Since the system has multiple monitors (and associates monitors that get good results with specific servers), some options for varying monitoring patterns wonāt work anymore.
Maybe the client should do a random number of samples from 1 to X instead of always doing X samples. Or randomly choose between a single sample and X samples. The penalty for a timeout is pretty steep, so that would still flag servers with significant packet loss. (And if the loss is just to an individual monitor, the system will replace that monitor).
Suggestions and input are very welcome. As I think was discussed above (or in another thread), the monitoring system tries to behave like an average NTP client would. What weāre testing for is ādoes this server work well for a typical NTP client.ā (You could argue the premise is flawed; despite how straightforward the low-level protocol is, clients have wildly different behaviors).
On my manage/monitors page, there is a blue block with statistics:
I only have one monitor, zakim1-yfhw4a, and its statistics lower down is:
So it looks like the blue, which I assume is the total, is double the monitorās value. zakim1-yfhw4a does have both an IPv4 and IPv6 address registered. Does that cause the numbers to be doubled?
Thanks, exactly this what I was recently referring with other words like:
Sorry, I did not follow the recent code changes. Does that mean if a single reply packet is lost from one monitoring run (3 or 4 or whatever number query packets, one after each other with small time spacing), is this fact showing up via decreasing the score of the NTP
server?
My opinion is that instead of trying to simulate any kind of NTP
client, the objective of the monitoring should be to get the most precise picture about the NTP
serversā state, sticking to the protocol specification.
The target is not to simulate a client as such. The target is to assess a serverās performance the way a client would see it, so as to assess its suitability for a client to get time off that server. To some extent, that obviously requires it to behave a bit like a client, but not fully simulate it, and certainly not just for the sake of it.
I still think that to some extent, that is besides the point of the monitoring. Again, the monitoring is about assessing how suitable a server is for clients to get time from it, not a detailed performance monitoring system (e.g., it isnāt even collecting detailed low-level information such as dispersion). And clients are rather resilient to some disturbances. And that needs to be taken into account somehow.
Now whether that is done āearlyā in the assessment process, i.e. as now, shortly after gathering the samples, or later in the process, I donāt mind. I tend to agree that gathering more detailed information first, and then āsmoothingā it somehow later to make it more realistic (in the sense above, i.e., suitability for clients to get time) might be preferrable. I just have the impression that that would require touching more sensitive parts of the system, and isnāt easily reconcilable with the current architecture. And that it would further bias the system to primarily cater to servers in better-served parts of the world, where detailed nuances of performance would even be visible. I.e., evolve to double as detailed performance monitoring system in those places. And continue to ignore, or even make worse the conditions in less-well served parts of the world, where a client can be happy, and often is/has to be, to get a response only to every nth request, or with higher offset, or larger delay, rather than not getting any usable service at all.
Can you elaborate on how the protocol specification would prevent more summarily taking a view as the monitors currently do, or as typical NTP clients do? Or how the current behavior violates the specification?
All in all, I agree that a 75% packet loss should not be treated the same as 0% packet loss. I have large sympathies for the proposal to reflect that difference via appropriate scaling/weighting of the scoring steps.
I understand the frustration. And I think that the complaints you refer to are a bit misleading/stem from some misunderstandings. As has often been pointed out in the context of many such complaints, e.g., by @avij, that it is not the monitoring itself, or the monitors, or their placement that is causing the issues. Rather that the monitoring, as intended, only reflects what a typical client would see. And unfortunately, in many places of the world, that view isnāt good.
Thus, the way to head off those complaints perhaps isnāt tinkering with the monitoring. But rather addressing the underlying, and well-known, and acknowledged bad situation in large parts of the world.
At one point, you mentioned having specific plans to address those issues (with some questions still needing further thought). Not sure where you stand on following up on those plans/ideas, would be interesting to hear your latest thoughts on those.
The aim of the whole NTP pool to provide acceptable NTP service even in those particular places of the world. However, these conditions should not influence the design and the implementation of the NTP monitoring, rather influence the way the monitoring result is used when selecting NTP servers for the NTP clients to those areas.
I see this as not making more restriction, rather a relief for the monitoring system design and implementation. For example, we should not feel to be obliged to take into account broken NTP client implementation(s) into the monitoring code.

rather influence the way the monitoring result is used when selecting NTP servers for the NTP clients to those areas
Ok, then maybe we are more on the same page than it might have seemed so far. I.e., the discussion should not focus on the aspect of number of samples in isolation, but would need a broader scope.
Similar to what I understood one of Daveās points to be, the concern was more about how wide the scope of the activity could be, and how much the current code architecture can be impacted.
So far, I had the impression the current changes were primarily about diversifying the sample collection process, i.e., get more monitors online, and more easily, and evolve the monitor selection.
All of those, especially the last, obviously also impact the scoring. But the actual scoring algorithm was not changed. The changes you propose would start to affect that part of the system as well.

we should not feel to be obliged to take into account broken NTP client implementation(s) into the monitoring code
Ok, fully agree. Still not sure though how sticking to the specification relates specifically to the question of how many samples should be taken (which I had understood so far to be the main issue of the discussion so far).

Still not sure though how that relates specifically to the question of how many samples should be taken (which I had understood so far to be the main issue of the discussion so far).
My statement wasnāt related to the number of samples. It was rather related to the selection of the NTP client. For example, relative to normal NTP clients (ntpd, chronyd), SNTP clients may provide bad quality time, but the NTP pool cannot help on that.
A few random data points. Unfortunately I donāt have a lab with all the possible client software and versions, but ntpdate in CentOS 7 seems to send 4 queries with two second intervals. C7 has been EOL for a long time and it can probably be mostly disregarded. chrony on Rocky Linux 9 with the iburst option seems to send five queries with a two second interval at startup (in this example case, YMMV). On the other hand, servers that use iburst but do not receive responses to some of the queries will typically tolerate the situation.
I do think that we should still check for overly aggressive rate limiting. The cleanest approach might be that the monitors report to the pool monitor management server the number of queries sent, the number of good responses received and the number of error/timeout responses. Or just the number of good responses, with the error count deduced from the number of good responses and project configuration. This would enable making better score calculations later on.

It was rather related to the selection of the NTP client.
Ah, ok, sorry I misunderstood.

SNTP clients may provide bad quality time, but the NTP pool cannot help on that.
Fully agree. It was a bit my concern that getting one sample only would move the system to become as sensitive to disturbances as a simplistic SNTP client while still not making up for the shortcomings of such clients, even with the most vigorous vetting of servers.
@john1
I need the tar.gz file for special case. To get the build number i have the agent installed on a debian system too.
During the update process you will see the build number and can use it in the url
Forget what iāve wrote - just saw that Ask setup a directory listing
Thank you @Ask
I released beta.5 of the new monitor. More testers are still welcome!
Send a message to myself and @apuls with the account information if you donāt have access to add a beta monitor at pool.ntp.org: the internet cluster of ntp servers and we can get you setup and added to the discourse category for monitor operators.
Finally getting around to try and test, just been so busy here, sorry all! But Iām getting the following error:
[root@web02 ~]# sudo -u ntpmon ntppool-agent setup -e test -a 2jp2meg
time=2025-07-07T12:10:24.593-07:00 level=INFO msg=āusing hostname for registrationā env=test hostname=web02.versadns.com
ntppool-agent: error: Post āhttps://beta-api.ntppool.dev/monitor/api/registrationā: dial tcp6 [2a04:4e42::311]:443: connect: network is unreachable
[root@web02 ~]#
@csweeney05 do you have IPv6 configured? Maybe the agent isnāt detecting if thereās IPv6 at all actually.
You can run it with --no-ipv6
and it wonāt try the IPv6 interface.
I had it fail if a protocol itās trying doesnāt work to make it explicit, but as mentioned maybe Iām not even looking if the system has an IP on that protocol.