Monitoring data availability


#1

For a (very long) while I’ve wanted to fix how the system stored the monitoring data long term. The old system was the most frequent point of breakage that needed manual intervention and wouldn’t sustainable handle more frequent monitoring probes or more monitoring systems.

It took a few false starts before I got something that seems like it’ll both be practical (as in can work simply and reliably with minimal care and feeding) and be reasonably extensible.

So far I have it saving data to avro files and to a local ClickHouse instance.

The Avro files are available at https://logscores.ntppool.org/beta/. The beta files are combined about 3GB. The production files are at /pool/ and are about 58GB currently. Neither have been updated for a few weeks; I’ll get the beta files refreshed soon and the production files not long after.

I think it cost about $7 when someone downloads all the /pool/ files, so if that turns out to be popular that probably won’t be sustainable…

I’ve also loaded the avro files into BigQuery (console). The beta data is in the “ntpbeta” dataset and the production data is in the “ntppool” data set.

I’m not totally sure how to share links to that, but this might work as an example.

If you have a google cloud account you should be able to do a terabyte of queries per month for free.

I’m planning to also add a table with some server meta data (country zones, IP type and probably the actual server IPs).


#2

I think it cost about $7 when someone downloads all the /pool/ files, so if that turns out to be popular that probably won’t be sustainable…

If this proves popular, making it available as a torrent may be a viable option if bandwidth charges are an issue. It may also be worth talking to a few CDN providers and asking if they might be willing to provide something at low / no cost, given the public benefit purpose of the pool project.


#3

Yeah, the logscores.ntppool.org site is actually already hosted on Fastly. The production files are at https://logscores.ntppool.org/pool/

I suspect the easiest way for most people to look at the data is via BigQuery.


#4

I can’t really see why someone would want all the pool files. I can understand why they would want their own server(s) for diagnostic purposes, but that’s about it. Maybe make sure your robots.txt excludes that directory so spiders aren’t downloading them.