Without monitoring and server scoring, the pool would serve bad time to some clients. Was it the case?
Critically, it’s a one man show. This time, it took you more than a day to address the failure of the pool management. Without peers, the migration was evidently poorly planned and executed. The one man team is a liability that you should honestly change.
It’s a bit unfair to say pool.ntp.org is a one-man show. While it’s true that it’s a volunteer project with most work done by one person, there are other active volunteers. If it were a commercial service, I’d agree more diversity would be preferable.
As a volunteer nonprofit project with no service guarantees, there are real benefits to having a benevolent dictator. Ask is clearly providing a quality of engineering and customer service with admirable openness and tolerance of a broad diversity in experience and competence among the much broader classes of monitor and NTP server operators participating, as well as what I personally consider highly abusive behavior representing the vast majority of queries.
As an example, AWS long passively encouraged a proliferation of VMs directly querying the pool when their customers would have been much better served by NTP servers in the same rack or at least datacenter. This was done by failing to modify defaults in their base VM images.
Thanks to relatively heroic efforts by folks including Brad Knowles, they have improved access to and promotion of very high quality local NTP and PTP services. Nonetheless, a good case can be made that they have been and continue to be poor netizens in relation to the pool, as they do not provide gratis pool service. Contrast with Cloudflare’s outstanding support for the pool.
I quite disagree. The number of open PRs, not to mention neglecting fully supporting IPv6, for years in GitHub is the sign of anything, but a benevolent dictator.
That stuff all sounds awfully dreary to me, but I understand the motivation. Are you volunteering to take on that type of work for the project?
It is also possible to have the appearance of hitting all your checkpoints and still have nonobvious single points of failure or choke points that butterfly wings could bring down.
I just want to share that since the 404 error on server monitor metrics started appearing from 3/16, traffic to my server has increased dramatically (from 40-50K requests per second to over 80K requests per second), as shown in the real-time statistics of my router below.
Both my server and router are handling such load very well, but just curious if this is a coincidence or causation of the 404 error.
From a management point of view, I do agree that this important project could benefit from establishment of reasonable (but not excessive) structures and processes, to ensure that it could be well maintained with timely failure resolution for the long run.
Having the server metrics being inoperative for a prolonged period of time will cause trouble to server management and damage reputation of this project to the public (even if the actual NTP service is unaffected).
If you have been here long enough, you might have heard about the delayed responses to vendor zone applications in the previous years, which was another example of the lack of process and timely management impacting operation and reputation of the Pool.
Would it be possible to start a new NTP pool to overcome some of the issues we have been facing over the last few years? Surely this would be a massive undertaking, but we could start small.
Of course that it’s possible, but the threshold to fork off an open source software project is historically higher. Methinks that we’re still far from that and that not all remedies have been tried yet.
I’d be really interested to see and participate in a new pool that is IPv6-only from the start.
As it seems that full IPv6 adoption on the pool’s side has been something that has intentionally not been pursued over a time period of many years, it would be a clear differentiator and an additional reason for existence of another pool.
There used to be one, which was akin to a club of aficionados about time keeping. They would add their servers, all stratum 1, to their publicly available pool and would even meet up to exchange information and socialize. Unfortunately, I lost the bookmark eons ago and my memory failed me to find them again.
As far as I can see, the monitoring is currently working fine, and has worked fine for the past week. Monitors probe the NTP servers as usual, and if a pool NTP server becomes unavailable, it will get dropped from the pool. The only impact seems to be missing graphs and CSV files on the server score pages. I don’t think the sky is falling.
No, indeed, it is not. But, e.g., I am having issues with one of my servers (low score), and I cannot really troubleshoot since a few days as I don’t see the details as to what is going on, e.g., if there is a pattern to which monitors see the issues, and what kind of issues they are.
Also, I do not think that the point is that there currently is an issue, can happen everywhere all the time. But rather that this keeps happening and the delay in getting things addressed, and other parts of the project are being stifled as well (e.g., IPv6). Because there factually is a bottleneck in the project. And some people keep appeasing and almost insisting that it has to be that way. Rather than considering how the situation could be improved, even when that will still not result in a perfect setup. When one person’s valid dinner obligation means that some non-negligible outage takes over a day to resolve, that should make one thinking. I extremely appreciate Ask’s hard work, and knowledge, and his steady hand running the project, and wouldn’t want to miss that. But even a benevolent dictator doesn’t have to do everything themselves, but can enlist help, e.g., to work on less central parts of the system, under the dictator’s guidance. E.g., why is the recommendation on the pool pages still to use the server stance? Why is there only one person who can add/create vendor zones? Can’t be so difficult that someone with technical experience couldn’t learn how to do it.
And it’s not just the core parts of the system. Support channels are the same. I again very much appreciate the work done by staff on that side. At the same time, it seems quite random when requests are being responded to, or whether at all. I really wonder why apparently it has to be this frustrating for pool operators needing this, who are just volunteers themselves as well, and why it is not possible to add to pool staff when the existing staff validly may not have as much time anymore to look after things as they used to.
Makes me sad, really, to see so much enthusiasm by other volunteers getting stifled, and many turning away, due to the needlessly rigid setup of the project.