Hello Guys ,
Monitoring and DNS services appear to be experiencing issues today, but the reported status remains fully green.
Hello Guys ,
Monitoring and DNS services appear to be experiencing issues today, but the reported status remains fully green.
I can confirm that monitoring metrics are not available.
Possibly due to the infrastructure migration:
Validating new server returns 500 (both IPv4 and IPv6).
Something is going on, as I was looking to change bandwidth setting and received a API error similar to @Kets_One above.
I stupidly logged out and now can’t log back in.
Hi everyone, I’m working on it (since a few hours ago). One of the internal CA’s in the old cluster had a bad day. (And my attention has been on moving things to the new clusters so I missed the alerts in the noise).
Alright, it should be all better now (and monitoring improved for the applications in the new cluster).
Internally some components use Vault for secrets and database credentials. Renewal of an internal certificate for Vault itself failed (spectacularly), including having the relevant internal CA be gone from Vault. (It would have expired in a few months, but by then that cluster will be shutdown so indirectly this was related to the “construction debris” from building the new clusters).
Mee too, the monitor plots have already been restored. Maybe its an indication that the migration is doing fine? Good luck with everything. Cheers.
It seems www.ntppool.org website has down starting from today (2026-03-05 0200Z), which gives 503 errors again. I do understand there’s a migration of infrastructure going on so I’m just being curious.
Also, I have noticed a drop of NTP traffic similiar to the down time of the missing monitor checks, which might indicates some issue is also going on with the DNS and monitoring system?
I can confirm that for now, despite the NTP Pool main website and server management page being down, DNS service is working fine.
I also collect some telemetry from the log page, which is also down
/log?limit=200&monitor=*
Just to report it.
Cheers.
First: Sorry about the prolonged outage Thursday/Friday. ![]()
I’ve been working on the data-api issue. I moved many billions of analytics data and missed some necessary changes to the API code itself (and then had to go to dinner). If it was something more critical I’d have rolled back, but I’m trying to keep momentum on the overall infrastructure migration.
Just goes to show that the pool is fragile by design, being centered on one infrastructure run by one dude in Nebraska.
@ebahapo You could draw the opposite conclusion, too.
It’s robust by design in that crucial parts of the infrastructure can be unavailable for a prolonged without affecting the service people use.
The NTP service is extremely distributed and diversified, the DNS service is very distributed (narrowing into some components not distributed at all and with much less people having access).