Monitoring data availability

For a (very long) while I’ve wanted to fix how the system stored the monitoring data long term. The old system was the most frequent point of breakage that needed manual intervention and wouldn’t sustainable handle more frequent monitoring probes or more monitoring systems.

It took a few false starts before I got something that seems like it’ll both be practical (as in can work simply and reliably with minimal care and feeding) and be reasonably extensible.

So far I have it saving data to avro files and to a local ClickHouse instance.

The Avro files are available at https://logscores.ntppool.org/beta/. The beta files are combined about 3GB. The production files are at /pool/ and are about 58GB currently. Neither have been updated for a few weeks; I’ll get the beta files refreshed soon and the production files not long after.

I think it cost about $7 when someone downloads all the /pool/ files, so if that turns out to be popular that probably won’t be sustainable…

I’ve also loaded the avro files into BigQuery (console). The beta data is in the “ntpbeta” dataset and the production data is in the “ntppool” data set.

I’m not totally sure how to share links to that, but this might work as an example.

If you have a google cloud account you should be able to do a terabyte of queries per month for free.

I’m planning to also add a table with some server meta data (country zones, IP type and probably the actual server IPs).

I think it cost about $7 when someone downloads all the /pool/ files, so if that turns out to be popular that probably won’t be sustainable…

If this proves popular, making it available as a torrent may be a viable option if bandwidth charges are an issue. It may also be worth talking to a few CDN providers and asking if they might be willing to provide something at low / no cost, given the public benefit purpose of the pool project.

Yeah, the logscores.ntppool.org site is actually already hosted on Fastly. The production files are at https://logscores.ntppool.org/pool/

I suspect the easiest way for most people to look at the data is via BigQuery.

1 Like

I can’t really see why someone would want all the pool files. I can understand why they would want their own server(s) for diagnostic purposes, but that’s about it. Maybe make sure your robots.txt excludes that directory so spiders aren’t downloading them.

1 Like

due to some legal issue I need history back in the day in order to prove that my server have a high probability just been connected as time server, but the avro file only contain server_id which I have no idea where I can match to my ip address as a prove.

is there any place I should look into?

I don’t know if this will help, but the https://www.ntppool.org/scores/ pages include a server ID number in small text at the bottom (above the “go up” link). I don’t know if it’s the same ID as used in the Avro files.

1 Like

Yup, you are correct!

2 Likes

didn’t spot that! thanks alot!

I’d like to add the IPs to the data, too, and a reference to the set of zones the server was registered in at the time of the monitoring probe.

I imagine it’d be two new tables:

  • one with all the IPs – I’m concerned about this because I worry that people will do something dumb or thoughtless with the data. It’s public anyway, but having it be a bit of trouble to put together might prevent dumb things.
  • another table with all the combinations of zones (and then each monitoring data entry can just reference that “list of zones” table with an integer instead of having to enumerate each zone. In some of the archive formats that might help on the data size (or maybe not… maybe they all compress fine and it’s pointless – something that’d need testing.

Having the zone data in the monitoring archives would be interesting because then we could do queries on monitoring problems on a per zone / country basis.