Problem with api.mon.ntppool.dev? net/http: timeout awaiting response headers

Hi, I see a large number of errors submitting monitoring results back with the error “net/http: timeout awaiting response headers”. But they seem to be intermittant, looks like some batches work, some don’t?

I am running in production, ntppool-agent version 4.1.2 on Debian with IPv4 andIPv6 connectivity.

Apr 28 10:16:12 ntp-mon ntppool-agent[78329]: level=INFO msg="batch processing" env=prod ip_version=v4 monitor_ip=95.89.38.215 count=8
Apr 28 10:16:22 ntp-mon ntppool-agent[78329]: level=INFO msg="batch processing" env=prod ip_version=v4 monitor_ip=95.89.38.215 count=11
Apr 28 10:16:31 ntp-mon ntppool-agent[78329]: level=INFO msg="batch processing" env=prod ip_version=v4 monitor_ip=95.89.38.215 count=4
Apr 28 10:16:38 ntp-mon ntppool-agent[78329]: level=INFO msg="batch processing" env=prod ip_version=v4 monitor_ip=95.89.38.215 count=2
Apr 28 10:16:58 ntp-mon ntppool-agent[78329]: level=INFO msg="batch processing" env=prod ip_version=v4 monitor_ip=95.89.38.215 count=2
Apr 28 10:17:06 ntp-mon ntppool-agent[78329]: level=INFO msg="batch processing" env=prod ip_version=v4 monitor_ip=95.89.38.215 count=1
Apr 28 10:17:33 ntp-mon ntppool-agent[78329]: level=INFO msg="ntp error" env=prod ip_version=v6 batchID=01KQ9JM2A9EW87NVTSVK382JHY server=2001:67c:440:f620:91:236:251:35 err="network: read: connection refused"
Apr 28 10:17:39 ntp-mon ntppool-agent[78329]: level=INFO msg="ntp error" env=prod ip_version=v6 batchID=01KQ9JM2A9EW87NVTSVK382JHY server=2001:1c04:5013:1201::3 err="network: i/o timeout"
Apr 28 10:17:39 ntp-mon ntppool-agent[78329]: level=INFO msg="ntp error" env=prod ip_version=v6 batchID=01KQ9JM2A9EW87NVTSVK382JHY server=2001:440:1880:7373::2 err="network: i/o timeout"
Apr 28 10:17:41 ntp-mon ntppool-agent[78329]: level=INFO msg="batch processing" env=prod ip_version=v6 monitor_ip=2a02:8108:4d82:e400:5054:ff:fe45:e603 count=7
Apr 28 10:17:59 ntp-mon ntppool-agent[78329]: level=INFO msg="ntp error" env=prod ip_version=v6 batchID=01KQ9JMPP2N701E4031MTGDHJ7 server=2a10:d582:2b6a:123:1:123:1:2 err="network: i/o timeout"
Apr 28 10:18:03 ntp-mon ntppool-agent[78329]: level=INFO msg="batch processing" env=prod ip_version=v6 monitor_ip=2a02:8108:4d82:e400:5054:ff:fe45:e603 count=21
Apr 28 10:18:15 ntp-mon ntppool-agent[78329]: level=INFO msg="batch processing" env=prod ip_version=v6 monitor_ip=2a02:8108:4d82:e400:5054:ff:fe45:e603 count=9
Apr 28 10:18:28 ntp-mon ntppool-agent[78329]: level=INFO msg="batch processing" env=prod ip_version=v4 monitor_ip=95.89.38.215 count=6
Apr 28 10:18:34 ntp-mon ntppool-agent[78329]: level=INFO msg="ntp error" env=prod ip_version=v6 batchID=01KQ9JNRJF4P0PR0QJGC4CSVHR server=2a02:c6c1:10:201:0:9e:0:12 err="network: i/o timeout"
Apr 28 10:18:35 ntp-mon ntppool-agent[78329]: level=INFO msg="batch processing" env=prod ip_version=v6 monitor_ip=2a02:8108:4d82:e400:5054:ff:fe45:e603 count=11
Apr 28 10:18:37 ntp-mon ntppool-agent[78329]: level=INFO msg="batch processing" env=prod ip_version=v4 monitor_ip=95.89.38.215 count=2
Apr 28 10:18:42 ntp-mon ntppool-agent[78329]: level=INFO msg="ntp error" env=prod ip_version=v6 batchID=01KQ9JPBMARA1VYXC8X38NY66F server=2001:19f0:5401:ebb::7b err="network: read: connection refused"
Apr 28 10:18:47 ntp-mon ntppool-agent[78329]: level=INFO msg="batch processing" env=prod ip_version=v6 monitor_ip=2a02:8108:4d82:e400:5054:ff:fe45:e603 count=14
Apr 28 10:18:54 ntp-mon ntppool-agent[78329]: level=WARN msg="local-check failure" env=prod ip_version=v6 server=time.nist.gov ip=2610:20:6f97:97::6 err="offset too large: -10.586967ms" trace_id=bd017e8f44e3ffd6a4b9a7ae063e089c span_id=4966bfbca2874642
Apr 28 10:19:05 ntp-mon ntppool-agent[78329]: level=WARN msg="local-check failure" env=prod ip_version=v6 server=uklon5-ntp-002.aaplimg.com ip=2a01:b740:a16:4000::31 err="network: i/o timeout" trace_id=bd017e8f44e3ffd6a4b9a7ae063e089c span_id=4966bfbca2874642
Apr 28 10:19:05 ntp-mon ntppool-agent[78329]: level=INFO msg=local-check env=prod ip_version=v6 failures=2 threshold=3 hosts=8 trace_id=bd017e8f44e3ffd6a4b9a7ae063e089c span_id=4966bfbca2874642
Apr 28 10:19:06 ntp-mon ntppool-agent[78329]: level=INFO msg="ntp error" env=prod ip_version=v4 batchID=01KQ9JPQXJ1QFDN259DA27JQSG server=89.28.74.161 err="network: i/o timeout"
Apr 28 10:19:16 ntp-mon ntppool-agent[78329]: level=ERROR msg="batch processing" env=prod ip_version=v4 monitor_ip=95.89.38.215 err="SubmitResults: deadline_exceeded: Post \"https://api.mon.ntppool.dev/monitor.v2.MonitorService/SubmitResults\": net/http: timeout awaiting response headers"
Apr 28 10:19:23 ntp-mon ntppool-agent[78329]: level=INFO msg="batch processing" env=prod ip_version=v6 monitor_ip=2a02:8108:4d82:e400:5054:ff:fe45:e603 count=15
Apr 28 10:19:33 ntp-mon ntppool-agent[78329]: level=INFO msg=local-check env=prod ip_version=v4 failures=0 threshold=5 hosts=12 trace_id=7222f366a7cf460616841bd672af5baf span_id=4f2e2f9c8967baa7
Apr 28 10:19:42 ntp-mon ntppool-agent[78329]: level=ERROR msg="batch processing" env=prod ip_version=v6 monitor_ip=2a02:8108:4d82:e400:5054:ff:fe45:e603 err="SubmitResults: deadline_exceeded: Post \"https://api.mon.ntppool.dev/monitor.v2.MonitorService/SubmitResults\": net/http: timeout awaiting response headers"
Apr 28 10:19:55 ntp-mon ntppool-agent[78329]: level=INFO msg="batch processing" env=prod ip_version=v4 monitor_ip=95.89.38.215 count=30
Apr 28 10:20:05 ntp-mon ntppool-agent[78329]: level=INFO msg="ntp error" env=prod ip_version=v6 batchID=01KQ9JRH8YC1C8ZMDDVC8B7YRR server=2606:a300:1004:7::2 err="network: i/o timeout"
Apr 28 10:20:14 ntp-mon ntppool-agent[78329]: level=INFO msg="ntp error" env=prod ip_version=v4 batchID=01KQ9JRT1HS7WPBJ579XF69H6B server=154.26.137.94 err="network: i/o timeout"
Apr 28 10:20:15 ntp-mon ntppool-agent[78329]: level=ERROR msg="batch processing" env=prod ip_version=v6 monitor_ip=2a02:8108:4d82:e400:5054:ff:fe45:e603 err="SubmitResults: deadline_exceeded: Post \"https://api.mon.ntppool.dev/monitor.v2.MonitorService/SubmitResults\": net/http: timeout awaiting response headers"
Apr 28 10:20:20 ntp-mon ntppool-agent[78329]: level=INFO msg="batch processing" env=prod ip_version=v4 monitor_ip=95.89.38.215 count=11
Apr 28 10:20:34 ntp-mon ntppool-agent[78329]: level=INFO msg="batch processing" env=prod ip_version=v4 monitor_ip=95.89.38.215 count=3
Apr 28 10:20:35 ntp-mon ntppool-agent[78329]: level=INFO msg="batch processing" env=prod ip_version=v6 monitor_ip=2a02:8108:4d82:e400:5054:ff:fe45:e603 count=12
Apr 28 10:20:49 ntp-mon ntppool-agent[78329]: level=INFO msg="batch processing" env=prod ip_version=v4 monitor_ip=95.89.38.215 count=1
Apr 28 10:20:49 ntp-mon ntppool-agent[78329]: level=ERROR msg="batch processing" env=prod ip_version=v6 monitor_ip=2a02:8108:4d82:e400:5054:ff:fe45:e603 err="getting server list: deadline_exceeded: Post \"https://api.mon.ntppool.dev/monitor.v2.MonitorService/GetServers\": net/http: timeout awaiting response headers"
Apr 28 10:21:05 ntp-mon ntppool-agent[78329]: level=INFO msg="batch processing" env=prod ip_version=v6 monitor_ip=2a02:8108:4d82:e400:5054:ff:fe45:e603 count=3
Apr 28 10:21:17 ntp-mon ntppool-agent[78329]: level=ERROR msg="batch processing" env=prod ip_version=v4 monitor_ip=95.89.38.215 err="SubmitResults: deadline_exceeded: Post \"https://api.mon.ntppool.dev/monitor.v2.MonitorService/SubmitResults\": net/http: timeout awaiting response headers"
Apr 28 10:21:51 ntp-mon ntppool-agent[78329]: level=ERROR msg="batch processing" env=prod ip_version=v4 monitor_ip=95.89.38.215 err="SubmitResults: deadline_exceeded: Post \"https://api.mon.ntppool.dev/monitor.v2.MonitorService/SubmitResults\": net/http: timeout awaiting response headers"


And I noticed that on the website that under monitor it shows 0.0 Test/min.

Is it a problem on my end, or do others experience the same?

Have the same issue on my end. Directly poking at https://api.mon.ntppool.dev/monitor.v2.MonitorService/GetServers causes firefox to yell at me about HSTS failure with… unknown issuer?! https://ca.ntppool.org/ is a thing?

Yes, it uses self-signed certificates and therefore isn’t that compatible with current browsers. But for example

curl -kv https://api.mon.ntppool.dev/monitor.v2.MonitorService/SubmitResults

(or your URL with GetServers) works for me and shows at least a HTTP 401 error and the response “Authentication required” when I tried. But I don’t want to poke too much at the production API.

In the wake of a recently completed infrastructure move, there currently still are some kinks that need ironing out, such as this one.

I.e., you are not alone with this issue, others experience it as well, see log of one of my monitors below.

Also check the graph on number of checks executed by monitors per second. If you go to the weekly view, you’ll see that it was much more stable, and that there were more checks (around 14000/s).

I believe Ask is already aware of this, else this thread will make him aware (he asked for feedback if things are amiss despite all the care taken during the move).

On one of my monitors:

Apr 28 18:05:47 vps-par4 ntppool-agent[17824]: level=WARN msg="local-check failure" env=prod ip_version=v6 server=uklon5-ntp-002.aaplimg.com ip=2a01:b740:a16:4000::31 err="network: i/o timeout" trace_id=95c50f2ca4b84a7158ecc124509d55e7 span_id=84aab6b6749cfc93
Apr 28 18:06:23 vps-par4 ntppool-agent[17824]: level=ERROR msg="batch processing" env=prod ip_version=v6 monitor_ip= err="SubmitResults: deadline_exceeded: Post \"https://api.mon.ntppool.dev/monitor.v2.MonitorService/SubmitResults\": net/http: timeout awaiting response headers"
Apr 28 18:07:01 vps-par4 ntppool-agent[17824]: level=ERROR msg="batch processing" env=prod ip_version=v6 monitor_ip= err="SubmitResults: deadline_exceeded: Post \"https://api.mon.ntppool.dev/monitor.v2.MonitorService/SubmitResults\": net/http: timeout awaiting response headers"
Apr 28 18:07:41 vps-par4 ntppool-agent[17824]: level=ERROR msg="batch processing" env=prod ip_version=v6 monitor_ip= err="SubmitResults: deadline_exceeded: Post \"https://api.mon.ntppool.dev/monitor.v2.MonitorService/SubmitResults\": net/http: timeout awaiting response headers"
Apr 28 18:08:25 vps-par4 ntppool-agent[17824]: level=ERROR msg="batch processing" env=prod ip_version=v6 monitor_ip= err="SubmitResults: deadline_exceeded: Post \"https://api.mon.ntppool.dev/monitor.v2.MonitorService/SubmitResults\": net/http: timeout awaiting response headers"
Apr 28 18:08:59 vps-par4 ntppool-agent[17824]: level=WARN msg="local-check failure" env=prod ip_version=v6 server=time.nist.gov ip=2610:20:6f97:97::4 err="offset too large: -16.846172ms" trace_id=0fd0f94d5c17799c8bc3036c11043544 span_id=1cc1cc813f717ea4
Apr 28 18:09:10 vps-par4 ntppool-agent[17824]: level=WARN msg="local-check failure" env=prod ip_version=v6 server=uklon5-ntp-002.aaplimg.com ip=2a01:b740:a16:4000::31 err="network: i/o timeout" trace_id=0fd0f94d5c17799c8bc3036c11043544 span_id=1cc1cc813f717ea4
Apr 28 18:09:40 vps-par4 ntppool-agent[17824]: level=ERROR msg="batch processing" env=prod ip_version=v6 monitor_ip= err="SubmitResults: deadline_exceeded: Post \"https://api.mon.ntppool.dev/monitor.v2.MonitorService/SubmitResults\": net/http: timeout awaiting response headers"
Apr 28 18:10:52 vps-par4 ntppool-agent[17824]: level=ERROR msg="batch processing" env=prod ip_version=v6 monitor_ip= err="SubmitResults: deadline_exceeded: Post \"https://api.mon.ntppool.dev/monitor.v2.MonitorService/SubmitResults\": net/http: timeout awaiting response headers"
Apr 28 18:12:01 vps-par4 ntppool-agent[17824]: level=ERROR msg="batch processing" env=prod ip_version=v6 monitor_ip= err="SubmitResults: deadline_exceeded: Post \"https://api.mon.ntppool.dev/monitor.v2.MonitorService/SubmitResults\": net/http: timeout awaiting response headers"

Hi everyone – this should be better now.

The database is working much slower in the new cluster; I’m not sure why. The servers are newer, but with fast networking instead of ridiculous? Probably not that. There are only 4 servers in the cluster instead of more, maybe there’s more Ceph overhead?

Anyway, I made some fixes to the API code so it’s more efficient in how it bundles the queries in transactions. From the overall number of probes stored it’s not clear if it helped, but the very slow SubmitResults API calls are gone according to the tracing data.

ntppool-agent-4.1.5-1.x86_64

level=ERROR msg=“batch processing” env=prod err=“SubmitResults: unknown: twirp error internal: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction” trace_id=c15ce20a203fcb26c666e44065d6770c span_id=e155c242ed6af59f

Most of the batches seem to get processed fine, though. I didn’t notice this error message earlier, but looks like there have been such messages also before 4.1.5.

Yeah, I think those mostly need the next iteration of the database scheme (and Postgres). It got exposed to the clients when I made other changes to otherwise make the database have less load, I think. I was working on it last year, but had to put it aside for the cluster move.

Next week I’ll get more metrics in place to keep an eye on how that subsystem is doing.