Monitoring and DNS services seems to have issues today ( 2026-03-01)

Hello Guys ,

Monitoring and DNS services appear to be experiencing issues today, but the reported status remains fully green.

2 Likes

The monitoring status indicates that everything is functioning correctly and is in good condition.

However, it is clearly not the case.

I can confirm that monitoring metrics are not available.

Possibly due to the infrastructure migration:

I’m also getting an error when changing netspeeds of my servers:

@ask could you please check?

Validating new server returns 500 (both IPv4 and IPv6).

Something is going on, as I was looking to change bandwidth setting and received a API error similar to @Kets_One above.

I stupidly logged out and now can’t log back in.

Hi everyone, I’m working on it (since a few hours ago). One of the internal CA’s in the old cluster had a bad day. (And my attention has been on moving things to the new clusters so I missed the alerts in the noise).

2 Likes

Alright, it should be all better now (and monitoring improved for the applications in the new cluster).

Internally some components use Vault for secrets and database credentials. Renewal of an internal certificate for Vault itself failed (spectacularly), including having the relevant internal CA be gone from Vault. (It would have expired in a few months, but by then that cluster will be shutdown so indirectly this was related to the “construction debris” from building the new clusters).

1 Like

Indeed, monitoring is back & I’m also again able to log in

Thanks @ask

1 Like

Mee too, the monitor plots have already been restored. Maybe its an indication that the migration is doing fine? Good luck with everything. Cheers.

5 Likes

It seems www.ntppool.org website has down starting from today (2026-03-05 0200Z), which gives 503 errors again. I do understand there’s a migration of infrastructure going on so I’m just being curious.

Also, I have noticed a drop of NTP traffic similiar to the down time of the missing monitor checks, which might indicates some issue is also going on with the DNS and monitoring system?

And it seems like the pages for the users and servers are down too:

2 Likes

I can confirm that for now, despite the NTP Pool main website and server management page being down, DNS service is working fine.

I also collect some telemetry from the log page, which is also down

/log?limit=200&monitor=*

Just to report it.

Cheers.

The 503 error on server page appears again:

First: Sorry about the prolonged outage Thursday/Friday. :frowning:

I’ve been working on the data-api issue. I moved many billions of analytics data and missed some necessary changes to the API code itself (and then had to go to dinner). If it was something more critical I’d have rolled back, but I’m trying to keep momentum on the overall infrastructure migration.

6 Likes

Just goes to show that the pool is fragile by design, being centered on one infrastructure run by one dude in Nebraska.

@ebahapo You could draw the opposite conclusion, too.

It’s robust by design in that crucial parts of the infrastructure can be unavailable for a prolonged without affecting the service people use.

The NTP service is extremely distributed and diversified, the DNS service is very distributed (narrowing into some components not distributed at all and with much less people having access).

5 Likes