The beta system has a few new changes.
Just to be really clear – this is just on the beta system.
- Two active “monitors”(!) which means that we’re getting closer to being able to run monitoring systems in Europe and Asia (and US East Coast?). This has been way too long in the works.
- One of the monitors does 4 queries (2 seconds apart).
- The monitor weren’t running consistently for a little while; they’re back now (though semi-manually run while I’m testing and debugging).
- The old perl monitor has been replaced with a new agent with more safety checks, features and improved performance.
- The CSV log has added an “error code”; it’s the “Kiss of Death” code (reference ID) or the socket error (“i/o timeout”) as appropriate.
- To see data for the individual monitors, add a monitor=* query parameter. The lines with an empty “monitor_name” are the aggregate score.
So far the “four samples” thing is interesting – before it goes to the production system there’s some work to do to figure out if it’s too aggressive (or if servers that don’t answer four requests in 8 seconds are too picky).
I can find examples in the logs, but it feels like “outing” operators. I’m not sure if that really makes sense, but it means I won’t post examples right now. If you are running a server that’s affected by this change, I invite you to post the example so we can discuss.
Also, if you don’t have your server in the beta system, please consider adding it.