Monitoring and DNS services seems to have issues today ( 2026-03-01)

canope · March 1, 2026, 10:13am

Hello Guys ,

Monitoring and DNS services appear to be experiencing issues today, but the reported status remains fully green.

canope · March 1, 2026, 10:14am

The monitoring status indicates that everything is functioning correctly and is in good condition.

canope · March 1, 2026, 10:15am

However, it is clearly not the case.

HAHAHAHA · March 1, 2026, 10:24am

I can confirm that monitoring metrics are not available.

Possibly due to the infrastructure migration:

Kets_One · March 1, 2026, 2:33pm

I’m also getting an error when changing netspeeds of my servers:

@ask could you please check?

cincura.net · March 1, 2026, 9:32pm

Validating new server returns 500 (both IPv4 and IPv6).

David_NZ · March 1, 2026, 10:34pm

Something is going on, as I was looking to change bandwidth setting and received a API error similar to @Kets_One above.

I stupidly logged out and now can’t log back in.

ask · March 2, 2026, 3:17am

Hi everyone, I’m working on it (since a few hours ago). One of the internal CA’s in the old cluster had a bad day. (And my attention has been on moving things to the new clusters so I missed the alerts in the noise).

ask · March 2, 2026, 4:16am

Alright, it should be all better now (and monitoring improved for the applications in the new cluster).

Internally some components use Vault for secrets and database credentials. Renewal of an internal certificate for Vault itself failed (spectacularly), including having the relevant internal CA be gone from Vault. (It would have expired in a few months, but by then that cluster will be shutdown so indirectly this was related to the “construction debris” from building the new clusters).

David_NZ · March 2, 2026, 4:45am

Indeed, monitoring is back & I’m also again able to log in

Thanks @ask

santiago · March 2, 2026, 2:30pm

Mee too, the monitor plots have already been restored. Maybe its an indication that the migration is doing fine? Good luck with everything. Cheers.

elvisimprsntr · March 2, 2026, 4:03pm

imi415 · March 5, 2026, 7:23am

It seems www.ntppool.org website has down starting from today (2026-03-05 0200Z), which gives 503 errors again. I do understand there’s a migration of infrastructure going on so I’m just being curious.

Also, I have noticed a drop of NTP traffic similiar to the down time of the missing monitor checks, which might indicates some issue is also going on with the DNS and monitoring system?

gunnar · March 5, 2026, 11:42am

And it seems like the pages for the users and servers are down too:

HAHAHAHA · March 5, 2026, 1:34pm

I can confirm that for now, despite the NTP Pool main website and server management page being down, DNS service is working fine.

santiago · March 6, 2026, 1:11pm

I also collect some telemetry from the log page, which is also down

/log?limit=200&monitor=*

Just to report it.

Cheers.

HAHAHAHA · March 8, 2026, 1:01am

The 503 error on server page appears again:

ask · March 8, 2026, 4:58am

First: Sorry about the prolonged outage Thursday/Friday.

I’ve been working on the data-api issue. I moved many billions of analytics data and missed some necessary changes to the API code itself (and then had to go to dinner). If it was something more critical I’d have rolled back, but I’m trying to keep momentum on the overall infrastructure migration.

ebahapo · March 9, 2026, 4:09am

Just goes to show that the pool is fragile by design, being centered on one infrastructure run by one dude in Nebraska.

ask · March 9, 2026, 9:04pm

@ebahapo You could draw the opposite conclusion, too.

It’s robust by design in that crucial parts of the infrastructure can be unavailable for a prolonged without affecting the service people use.

The NTP service is extremely distributed and diversified, the DNS service is very distributed (narrowing into some components not distributed at all and with much less people having access).

Topic		Replies	Views
No PR action for project and dramatic reduction in the number of active servers	60	3004	January 7, 2020
Additional monitoring servers (help wanted) Server operators	36	3624	November 26, 2019
Suggestions for monitors, as Newark fails a lot and the scores are dropped too quickly Server operators monitoring	91	4712	August 2, 2021
Server access dead time characterisation Pool Development	17	2144	March 6, 2021
Monitoring stations timeout to our NTP servers Server operators	103	8908	May 22, 2021

Monitoring and DNS services seems to have issues today ( 2026-03-01)

Related topics