Monitoring data availability

ask · February 19, 2019, 8:44am

For a (very long) while I’ve wanted to fix how the system stored the monitoring data long term. The old system was the most frequent point of breakage that needed manual intervention and wouldn’t sustainable handle more frequent monitoring probes or more monitoring systems.

It took a few false starts before I got something that seems like it’ll both be practical (as in can work simply and reliably with minimal care and feeding) and be reasonably extensible.

So far I have it saving data to avro files and to a local ClickHouse instance.

The Avro files are available at https://logscores.ntppool.org/beta/. The beta files are combined about 3GB. The production files are at /pool/ and are about 58GB currently. Neither have been updated for a few weeks; I’ll get the beta files refreshed soon and the production files not long after.

I think it cost about $7 when someone downloads all the /pool/ files, so if that turns out to be popular that probably won’t be sustainable…

I’ve also loaded the avro files into BigQuery (console). The beta data is in the “ntpbeta” dataset and the production data is in the “ntppool” data set.

I’m not totally sure how to share links to that, but this might work as an example.

If you have a google cloud account you should be able to do a terabyte of queries per month for free.

I’m planning to also add a table with some server meta data (country zones, IP type and probably the actual server IPs).

erayd · February 21, 2019, 1:35am

I think it cost about $7 when someone downloads all the /pool/ files, so if that turns out to be popular that probably won’t be sustainable…

If this proves popular, making it available as a torrent may be a viable option if bandwidth charges are an issue. It may also be worth talking to a few CDN providers and asking if they might be willing to provide something at low / no cost, given the public benefit purpose of the pool project.

ask · February 21, 2019, 9:34am

Yeah, the logscores.ntppool.org site is actually already hosted on Fastly. The production files are at https://logscores.ntppool.org/pool/

I suspect the easiest way for most people to look at the data is via BigQuery.

littlejason99 · February 21, 2019, 1:25pm

I can’t really see why someone would want all the pool files. I can understand why they would want their own server(s) for diagnostic purposes, but that’s about it. Maybe make sure your robots.txt excludes that directory so spiders aren’t downloading them.

AndCycle · December 19, 2019, 1:01pm

due to some legal issue I need history back in the day in order to prove that my server have a high probability just been connected as time server, but the avro file only contain server_id which I have no idea where I can match to my ip address as a prove.

is there any place I should look into?

mnordhoff · December 19, 2019, 4:28pm

I don’t know if this will help, but the https://www.ntppool.org/scores/ pages include a server ID number in small text at the bottom (above the “go up” link). I don’t know if it’s the same ID as used in the Avro files.

ask · December 19, 2019, 4:33pm

Yup, you are correct!

AndCycle · December 19, 2019, 4:41pm

didn’t spot that! thanks alot!

ask · January 6, 2020, 11:01am

I’d like to add the IPs to the data, too, and a reference to the set of zones the server was registered in at the time of the monitoring probe.

I imagine it’d be two new tables:

one with all the IPs – I’m concerned about this because I worry that people will do something dumb or thoughtless with the data. It’s public anyway, but having it be a bit of trouble to put together might prevent dumb things.
another table with all the combinations of zones (and then each monitoring data entry can just reference that “list of zones” table with an integer instead of having to enumerate each zone. In some of the archive formats that might help on the data size (or maybe not… maybe they all compress fine and it’s pointless – something that’d need testing.

Having the zone data in the monitoring archives would be interesting because then we could do queries on monitoring problems on a per zone / country basis.

Topic		Replies	Views
Beta monitoring operators/systems Pool Development beta , monitoring	15	1001	May 5, 2022
Beta site changes - monitoring updates Pool Development monitoring , beta	17	3061	May 4, 2018
Scores gone on beta site Pool Development beta	2	86	January 3, 2025
Issues with some pool backend systems? Pool Development	13	232	February 24, 2025
Beta system now has multiple monitors Pool Development beta , monitoring	32	4347	August 11, 2018

Monitoring data availability

Related topics