"io timeout" in CSV logs


#1

Hi,

I want to ask about this “io timeout” that appears on csv log, i see the pontuation droping and connection dropping. What should be done at the server side to skip this errors? I’m operator of a stratum 1 NTP server that is a Raspberry PI 3 without original crystal oscillator (19.2MHz), we has connected the crystal input at one DDS that is connected to a local time scale UTC(LRTE) and is traceable in BIPM time department: https://goo.gl/oSzRrF

the server: https://www.ntppool.org/scores/143.107.229.210

big drops was due server disconnection and settings changing because of upgrades, but, this “io timeout” always, always happened.

Best regards,

Luiz Paulo.


#2

Most probable reason is overwhelming inqueries rendering your server unresponsive. Try reduce your server bandwidth setting and see if it helps.


#3

Thanks for reply! I changed it to 10MB/s, here we have a bandwidth of 1GB/s, but raspberry ethernet only supports 10/100 connection (Pi 3B not plus), did you think is good keep this value (10mb/s) or i can incrase it to something near to 100mb/s?


#4

The net speed is used to balance the load between the pool servers. If your connection is asymmetric (like most DSL connections) you should use the lower speed.

The amount of queries your server will get is generally directly proportional to the netspeed, so a 50Mbit setting will get about 5 times more traffic than the 10Mbit setting. The pool will only use a fraction of the “netspeed setting”. Be aware that the amount of queries to your server will grow over time.


#5

Depending on which NTP program you use, there are commands to monitor performance and you can see if it’s dropping packets or if you are just experiencing a network issue between you & the monitoring server.


#6

The program that does the monitoring is at https://github.com/ntppool/monitor - review and patches are welcome. The production system is configured to just do “one probe”. The old system did more (sorta).

I have been working on making the long term logs more useful (and accessible). In the process I noticed that the “worse monitoring” started around March last year (and not during the summer when various network changes were done, to my surprise).

It used to be that about 0.5% of the monitoring probes to servers that have recently worked would fail. This is still the case for IPv6, but for IPv4 servers it’s gone up to around 1%.

(I’ll most the queries and data in more detail later when I’m done with the data processing; the server doing it is using terabytes of (compressed) space temporarily, it’s pretty slow to work with).


#7

I’m using ntpd from raspbian.


#8

Seems more stable now, i’ve dropped server speed configuration to 768kbit, this is a little shameful but it keeps server available. Also, tunned some parameters at sysctl in server side. Like:

net.netfilter.nf_conntrack_udp_timeout = 120
net.netfilter.nf_conntrack_udp_timeout_stream = 720
net.netfilter.nf_conntrack_buckets = 48256
net.netfilter.nf_conntrack_max = 262144
net.nf_conntrack_max = 262144

net.core.netdev_max_backlog = 350000
net.core.rmem_max = 38414400
net.core.rmem_default = 38414400
net.ipv4.udp_rmem_min = 24384
net.ipv4.udp_wmem_min = 24384
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr
net.core.somaxconn = 16384

i dont know if tcp_congestion_control changing makes a real effect at ntp, concerning this uses upt protocol. But, you guys have some another recommendation? The server is in one public university, that we have a real IP addres with full internet access, now i’m measuring a conntrack count of 220324 what is a huge value, stock is arround 65536… What did you think about?


#9

I saw here the internet distribution system, the connection is symmetric.


#10

You should not use conntrack on ntp
Here is an example to use NOTRACK with iptables:
where xxx = ip

sudo iptables -t raw -A PREROUTING -d xxx.xxx.xxx.xxx -p udp --dport 123 -j NOTRACK
sudo iptables -t raw -A OUTPUT -s xxx.xxx.xxx.xxx -p udp --sport 123 -j NOTRACK

see if you can set an higher net speed after this.
you can also count nf_conntrack, try before and after adding to iptables with command:

cat /proc/net/nf_conntrack | wc -l


#11

If you are behind a NAT router, then you will also need to check if you can turn off conntrack on the router, or raise the state count.


#12

I’m not using a NAT router, its a direct ip address, 143.107.229.210 if you want to check it.

About these commands, what they should do?


#13

It disables the conntrack, so it does not save states off the connections, since UDP is stateless, it does not require a state in the table


#14

can i still using conntrack -L -p udp | wc -l to show it?

It drops from 200000 to 200 and incrasing.


#15

you can still use the command, but it will not count port 123, since it will exhaust the server with all the connections.

so if you get around 200 - 1000, than it works, then it only show all other connection than ntp :slight_smile:


#16

Nice!! So, saving the udp connection states it decrases server performance? Why?


#17

It all depends on how much memory and cpu you have.
An exampel is:

Each state consumes approximately 1KB of RAM. So for 1,000,000 states, 1GB of RAM would be required just for states.

Each connection flowing through the firewall will consume two states: One state when entering, and one state when exiting.

So when high number of states it have to handle, it also can affect peformance.

Thats why its a good thing to turn on NOTRACK on NTP


#18

very intersting, i will use this too for my openwrt rpi server in my home, as this rpi ntp server i only have 1gb of ram… And… in accordance with this logic, logging this can eat a lot of RAM!!


#19

Some pictures from my setup:

Symmetricom 5071;
HROG 10 Micro Phase Stepper;
Stanford DDS;
Raspberry pi 3 running without crystal (external 19.2MHz locked to our UTC(LRTE) from HROG) and PPS input.

The monitoring software is from our GPS receiver to compare local frequency with GPS frequency and keep our time tracked to UTC.


#20

Maybe this can run in this Pi server? Or needs a powerful hardware?