Some servers polled less frequently since noon today

I noticed that some (but not all) servers get polled less frequently now (example):


Other examples:
94.237.125.46, 45.115.225.48, 94.237.79.110, 2a01:4f9:4a:460a::2 (these were not hard to find)

However, according to https://status.ntppool.org/ the number of monitoring probes has not dropped. Maybe database problems and the results get stored to one database node only, instead of the entire cluster? Alerting @ask in any case.

Similar story

I noticed that too. About half of my servers are still polled regularly, the rest much less than usual.

1 Like

I’m not sure if it is a coincidence, but it only seems to affect the IPv4 side of my servers. The IPv6 graphs look normal.

For me too, except for one IPv6 server which is also affected.

Looks like the monitors stopped monitoring/reporting on 2025-01-02 at 15:57 UTC

Is the monitoring system broken?

https://www.ntppool.org/scores/69.111.183.21/log?limit=200&monitor=\*

ts_epoch,ts,offset,step,score,monitor_id,monitor_name,rtt,leap,error
1767371341,2026-01-02 16:29:01,0.001829978,1,20,308,ustul1-762zq1,33.212,,
1767369424,2026-01-02 15:57:04,,1,20,24,recentmedian,,,
1767369424,2026-01-02 15:57:04,-0.000186416,1,20,87,usewr3-1a6a7hp,36.944,,
1767368355,2026-01-02 15:39:15,0.003878782,1,20,11,nlams1-1a6a7hp,121.811,,
1767368305,2026-01-02 15:38:25,-0.001088565,1,20,112,usiad1-1tcp71g,32.567,,
1767368245,2026-01-02 15:37:25,-0.001254963,1,20,277,usdca1-3grrbhg,40.154,,
1767368192,2026-01-02 15:36:32,-0.000874423,1,20,306,deham1-2zg3vnt,132.56,,
1767366652,2026-01-02 15:10:52,-0.000383374,1,20,188,ussuu1-3strqkc,36.667,,
1767365887,2026-01-02 14:58:07,-0.000178499,1,20,188,ussuu1-3strqkc,37.026,,
1767365488,2026-01-02 14:51:28,-0.000330339,1,20,308,ustul1-762zq1,33.407,,
1767365480,2026-01-02 14:51:20,-0.000151894,1,20,87,usewr3-1a6a7hp,37.219,,

I think it’s working again now.

1 Like

@avij how did you spot that so fast?! :slight_smile:

Thanks everyone. One of the clickhouse shards had run out of storage (~150GB each) with the new system generating much more data. Monitoring was happening, but the graphs run off the clickhouse database that wasn’t (depending on which shard was used) accepting data consistently.

It should all be backfilled now.

(I’ve been moving house and working on moving the NTP Pool system to new server infrastructure, so my time to pay attention to the existing system has been stretched thin – I appreciate you all looking out for oddities!)

4 Likes