"io timeout" in CSV logs

electronicslover · February 5, 2019, 12:37pm

Hi,

I want to ask about this “io timeout” that appears on csv log, i see the pontuation droping and connection dropping. What should be done at the server side to skip this errors? I’m operator of a stratum 1 NTP server that is a Raspberry PI 3 without original crystal oscillator (19.2MHz), we has connected the crystal input at one DDS that is connected to a local time scale UTC(LRTE) and is traceable in BIPM time department: https://goo.gl/oSzRrF

the server: https://www.ntppool.org/scores/143.107.229.210

big drops was due server disconnection and settings changing because of upgrades, but, this “io timeout” always, always happened.

Best regards,

Luiz Paulo.

alica · February 5, 2019, 5:08pm

Most probable reason is overwhelming inqueries rendering your server unresponsive. Try reduce your server bandwidth setting and see if it helps.

electronicslover · February 5, 2019, 7:45pm

Thanks for reply! I changed it to 10MB/s, here we have a bandwidth of 1GB/s, but raspberry ethernet only supports 10/100 connection (Pi 3B not plus), did you think is good keep this value (10mb/s) or i can incrase it to something near to 100mb/s?

kennethr · February 5, 2019, 7:51pm

The net speed is used to balance the load between the pool servers. If your connection is asymmetric (like most DSL connections) you should use the lower speed.

The amount of queries your server will get is generally directly proportional to the netspeed, so a 50Mbit setting will get about 5 times more traffic than the 10Mbit setting. The pool will only use a fraction of the “netspeed setting”. Be aware that the amount of queries to your server will grow over time.

littlejason99 · February 5, 2019, 8:48pm

Depending on which NTP program you use, there are commands to monitor performance and you can see if it’s dropping packets or if you are just experiencing a network issue between you & the monitoring server.

ask · February 7, 2019, 4:06pm

The program that does the monitoring is at https://github.com/ntppool/monitor - review and patches are welcome. The production system is configured to just do “one probe”. The old system did more (sorta).

I have been working on making the long term logs more useful (and accessible). In the process I noticed that the “worse monitoring” started around March last year (and not during the summer when various network changes were done, to my surprise).

It used to be that about 0.5% of the monitoring probes to servers that have recently worked would fail. This is still the case for IPv6, but for IPv4 servers it’s gone up to around 1%.

(I’ll most the queries and data in more detail later when I’m done with the data processing; the server doing it is using terabytes of (compressed) space temporarily, it’s pretty slow to work with).

electronicslover · February 9, 2019, 5:18pm

I’m using ntpd from raspbian.

electronicslover · February 9, 2019, 5:23pm

Seems more stable now, i’ve dropped server speed configuration to 768kbit, this is a little shameful but it keeps server available. Also, tunned some parameters at sysctl in server side. Like:

net.netfilter.nf_conntrack_udp_timeout = 120
net.netfilter.nf_conntrack_udp_timeout_stream = 720
net.netfilter.nf_conntrack_buckets = 48256
net.netfilter.nf_conntrack_max = 262144
net.nf_conntrack_max = 262144

net.core.netdev_max_backlog = 350000
net.core.rmem_max = 38414400
net.core.rmem_default = 38414400
net.ipv4.udp_rmem_min = 24384
net.ipv4.udp_wmem_min = 24384
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr
net.core.somaxconn = 16384

i dont know if tcp_congestion_control changing makes a real effect at ntp, concerning this uses upt protocol. But, you guys have some another recommendation? The server is in one public university, that we have a real IP addres with full internet access, now i’m measuring a conntrack count of 220324 what is a huge value, stock is arround 65536… What did you think about?

electronicslover · February 9, 2019, 5:30pm

I saw here the internet distribution system, the connection is symmetric.

kennethr · February 9, 2019, 5:48pm

You should not use conntrack on ntp
Here is an example to use NOTRACK with iptables:
where xxx = ip

sudo iptables -t raw -A PREROUTING -d xxx.xxx.xxx.xxx -p udp --dport 123 -j NOTRACK
sudo iptables -t raw -A OUTPUT -s xxx.xxx.xxx.xxx -p udp --sport 123 -j NOTRACK

see if you can set an higher net speed after this.
you can also count nf_conntrack, try before and after adding to iptables with command:

cat /proc/net/nf_conntrack | wc -l

kennethr · February 9, 2019, 6:06pm

If you are behind a NAT router, then you will also need to check if you can turn off conntrack on the router, or raise the state count.

electronicslover · February 9, 2019, 6:07pm

I’m not using a NAT router, its a direct ip address, 143.107.229.210 if you want to check it.

About these commands, what they should do?

kennethr · February 9, 2019, 6:16pm

It disables the conntrack, so it does not save states off the connections, since UDP is stateless, it does not require a state in the table

electronicslover · February 9, 2019, 6:19pm

can i still using conntrack -L -p udp | wc -l to show it?

It drops from 200000 to 200 and incrasing.

kennethr · February 9, 2019, 6:29pm

you can still use the command, but it will not count port 123, since it will exhaust the server with all the connections.

so if you get around 200 - 1000, than it works, then it only show all other connection than ntp

electronicslover · February 9, 2019, 6:32pm

Nice!! So, saving the udp connection states it decrases server performance? Why?

kennethr · February 9, 2019, 6:40pm

It all depends on how much memory and cpu you have.
An exampel is:

Each state consumes approximately 1KB of RAM. So for 1,000,000 states, 1GB of RAM would be required just for states.

Each connection flowing through the firewall will consume two states: One state when entering, and one state when exiting.

So when high number of states it have to handle, it also can affect peformance.

Thats why its a good thing to turn on NOTRACK on NTP

electronicslover · February 9, 2019, 6:42pm

very intersting, i will use this too for my openwrt rpi server in my home, as this rpi ntp server i only have 1gb of ram… And… in accordance with this logic, logging this can eat a lot of RAM!!

electronicslover · February 9, 2019, 8:57pm

Some pictures from my setup:

Symmetricom 5071;
HROG 10 Micro Phase Stepper;
Stanford DDS;
Raspberry pi 3 running without crystal (external 19.2MHz locked to our UTC(LRTE) from HROG) and PPS input.

The monitoring software is from our GPS receiver to compare local frequency with GPS frequency and keep our time tracked to UTC.

electronicslover · February 9, 2019, 9:00pm

Maybe this can run in this Pi server? Or needs a powerful hardware?

Topic		Replies	Views
Sawtooth graph - every time score is over 10 the next check times out Server operators	18	929	October 23, 2020
Problems with NTP?	7	276	January 4, 2025
Monitoring Croatian servers problems since Thursday 27/12/2018 Server operators monitoring	3	1062	January 3, 2019
Timeouts from San Jose Server operators	7	607	November 10, 2021
Time server pool problems since mid February Server operators monitoring	18	4445	January 29, 2018

"io timeout" in CSV logs

Related topics