Some servers polled less frequently since noon today

I noticed that some (but not all) servers get polled less frequently now (example):


Other examples:
94.237.125.46, 45.115.225.48, 94.237.79.110, 2a01:4f9:4a:460a::2 (these were not hard to find)

However, according to https://status.ntppool.org/ the number of monitoring probes has not dropped. Maybe database problems and the results get stored to one database node only, instead of the entire cluster? Alerting @ask in any case.

Similar story

I noticed that too. About half of my servers are still polled regularly, the rest much less than usual.

1 Like

I’m not sure if it is a coincidence, but it only seems to affect the IPv4 side of my servers. The IPv6 graphs look normal.

For me too, except for one IPv6 server which is also affected.

Looks like the monitors stopped monitoring/reporting on 2025-01-02 at 15:57 UTC

Is the monitoring system broken?

https://www.ntppool.org/scores/69.111.183.21/log?limit=200&monitor=\*

ts_epoch,ts,offset,step,score,monitor_id,monitor_name,rtt,leap,error
1767371341,2026-01-02 16:29:01,0.001829978,1,20,308,ustul1-762zq1,33.212,,
1767369424,2026-01-02 15:57:04,,1,20,24,recentmedian,,,
1767369424,2026-01-02 15:57:04,-0.000186416,1,20,87,usewr3-1a6a7hp,36.944,,
1767368355,2026-01-02 15:39:15,0.003878782,1,20,11,nlams1-1a6a7hp,121.811,,
1767368305,2026-01-02 15:38:25,-0.001088565,1,20,112,usiad1-1tcp71g,32.567,,
1767368245,2026-01-02 15:37:25,-0.001254963,1,20,277,usdca1-3grrbhg,40.154,,
1767368192,2026-01-02 15:36:32,-0.000874423,1,20,306,deham1-2zg3vnt,132.56,,
1767366652,2026-01-02 15:10:52,-0.000383374,1,20,188,ussuu1-3strqkc,36.667,,
1767365887,2026-01-02 14:58:07,-0.000178499,1,20,188,ussuu1-3strqkc,37.026,,
1767365488,2026-01-02 14:51:28,-0.000330339,1,20,308,ustul1-762zq1,33.407,,
1767365480,2026-01-02 14:51:20,-0.000151894,1,20,87,usewr3-1a6a7hp,37.219,,

I think it’s working again now.

1 Like

@avij how did you spot that so fast?! :slight_smile:

Thanks everyone. One of the clickhouse shards had run out of storage (~150GB each) with the new system generating much more data. Monitoring was happening, but the graphs run off the clickhouse database that wasn’t (depending on which shard was used) accepting data consistently.

It should all be backfilled now.

(I’ve been moving house and working on moving the NTP Pool system to new server infrastructure, so my time to pay attention to the existing system has been stretched thin – I appreciate you all looking out for oddities!)

5 Likes

@ask

looks like the system is doing the same thing again.

Yes, i observe the same issue since around noon today, but only on some of my servers:

I checked and it happens on all servers, except my monitor-only server, it seems to be checked the same ammount.

And yes, it’s better to check less then before. Very good.

3 Likes

oof, thank you. The clickhouse data for the historical monitoring scores (before however many times replication) is almost half a terabyte now; I wasn’t growing the disks fast enough.

I’ve been setting up the new clusters. As part of the data moving I’ll see if I can have the truly old data stored in a more manageable way.

No, I’m in California. :joy:

3 Likes

@elvisimprsntr credit where credit due: xkcd: Dependency

1 Like

More gaps in monitoring data.

Doesn’t instill confidence in pool.ntp.org. I’ll keep using NIST and USNO as backups to my GPS+PPS disciplined NTP server.

Same here…happens sometimes. Probably Ask was working on something.

See mine:

Exact same gap, at the same time. Has no impact on clients that request pool-servers.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.