Getting beyond 10k qps?


#10

Quick update. My Bangalore server (3 core 1GB Digital Ocean for $15/month) is now handing peaks of 80k QPS set to gigabit on IP4 and IP6. The @mlichvar rsntp server works very well at these loads. Traffic is around 12 TB/month.


#11

Using rsntp, has anybody seen EINVAL/os error 22 errors? I get anywhere from 4-12 / hour, eg:

Jun 14 21:32:11 minime rsntp[6490]: Thread #2 failed to send packet: Invalid argument (os error 22)
Jun 14 21:32:31 minime rsntp[6490]: Thread #2 failed to send packet: Invalid argument (os error 22)
Jun 14 21:52:20 minime rsntp[6490]: Thread #2 failed to send packet: Invalid argument (os error 22)
Jun 14 21:52:40 minime rsntp[6490]: Thread #2 failed to send packet: Invalid argument (os error 22)
Jun 14 22:22:12 minime rsntp[6490]: Thread #2 failed to send packet: Invalid argument (os error 22)
Jun 14 22:22:12 minime rsntp[6490]: Thread #2 failed to send packet: Invalid argument (os error 22)
Jun 14 22:54:01 minime rsntp[6490]: Thread #2 failed to send packet: Invalid argument (os error 22)
Jun 14 22:54:21 minime rsntp[6490]: Thread #2 failed to send packet: Invalid argument (os error 22)
Jun 14 23:05:21 minime rsntp[6490]: Thread #1 failed to send packet: Invalid argument (os error 22)
Jun 15 00:10:39 minime rsntp[6490]: Thread #3 failed to send packet: Invalid argument (os error 22)

Everything seems to be working okay, so I’m trying to figure out if it’s a bad/bonus source IP in a packet or if for some reason some of the time data in the buffer is wrong… I’m guessing IP…


#12

@mlichvar cc the post above:


#13

I get that too and I think your guess is correct. It’s probably trying to respond to an invalid IP address. If you pull from git, the error message will include the remote address.


#14

Awesome, thanks! That’ll definitely help debug the errors!


#15

How do you change port in ntpd so it listens on port 11123 that rsntp reads from?


#16

I don’t think you can change the port in the standard ntpd distribution…


#17

I don’t think you can change the port on ‘ntpd’, but you CAN use the -I option when starting ntpd to make it bind only to 127.0.0.1 (on 123), Then when you start rsntpd, you do something like:

-a 192.168.1.5:123 -s 127.0.0.1:123

Change the IP in -a, obviously :slightly_smiling_face:


#18

Thanks, I got it to work but I have to use one more public IP that ntpd itself can use.
In ntp.conf I changed from
interface listen mypublicip-1
to
interface listen 127.0.0.1
interface listen mypublicip-2
interface ignore wildcard
and then I start rsntp with
rsntp -4 4 -a mypublicip-1:123 -s 127.0.0.1:123 &

First I have to say that rsntp clearly works better than ntpd and can take much larger load :grinning:
However some issues @mlichvar :

Question 1

It doesnt work to use IPv4 and IPv6 at the same, it just dies even that it says that it starts?:
$ rsntp -4 4 -6 4 -a 193.228.143.22:123 -b 2a03:8600::dd:123 -s 127.0.0.1:123
thread ‘main’ panicked at ‘called Result::unwrap() on an Err value: AddrParseError(())’, libcore/result.rs:945:5
note: Run with RUST_BACKTRACE=1 for a backtrace.
Server thread #1 listening on V4(193.228.143.22:123)
Server thread #2 listening on V4(193.228.143.22:123)
Server thread #3 listening on V4(193.228.143.22:123)
Server thread #4 listening on V4(193.228.143.22:123)

Question 2

It says that “Address already in use” even that it isnt and it works anyway?:
$ rsntp -4 4 -a 193.228.143.22:123 -s 127.0.0.1:123
Server thread #1 listening on V4(193.228.143.22:123)
Server thread #2 listening on V4(193.228.143.22:123)
Server thread #3 listening on V4(193.228.143.22:123)
Server thread #4 listening on V4(193.228.143.22:123)
thread ‘’ panicked at ‘Couldn’t bind socket: Address already in use (os error 98)’, src/main.rs:298:23
note: Run with RUST_BACKTRACE=1 for a backtrace.

Question 3

My servers receive approx 3000-4000 KB/s but just sends 1000-1800 KB/s.
Where did the other traffic go? It become little better with rsntp but
still alot of traffic that doesnt get a reply. I use rsntp with 4 cores.

Question 4

Did you know that it stops processing requests if you start it with & and then logout?
It doesnt die it just stops processing.


#19
  • Q1: The IPv6 address specified by the -b option needs to be in brackets, e.g. [2a03:8600::dd]:123
  • Q2: Something is probably listening on the IPv6 port 123. ss -u -6 -l -p should print what it is.
  • Q3: Maybe it is still too much for the CPU? If it’s fully loaded, the receive queue might be full and the packets getting dropped before rsntp can get them.
  • Q4: Interesting. Maybe redirecting the output from rsntp to /dev/null would prevent that?

#20

Personally I just made a systemd .unit file and ran it under systemd:

[Unit]
Description=Rust NTP Server
After=network.target

[Service]
Type=simple
Restart=always
ExecStart=/usr/local/sbin/rsntp --ipv4-threads 3 --ipv6-threads 3

[Install]
WantedBy=multi-user.target


#21

Q1: Works, thanks.
Q2: Correct, “interface ignore wildcard” doesnt apply to IPv6 but if I start ntpd with -4 it starts without IPv6.
Q3: Looked up what the traffic was and its my co-los router that sends ECMP traffic to hosts it shouldnt.
Q4: That worked :grinning:
Q5: rsntp seems a bit fragile for the first few minutes when it just has been started, it doesnt happend offen but it has happend that it just stops processing requests without any message. I restart it until its
stable.


#22

Now I have tested rsntp for over 2 months and I must say that its very stable and use resources better than ntp. I recommend it for others that have a need to reply to a few billion ntp requests.

Packet stats for 6 of my 8 ntp servers:

vlan161-1-mini

Traffic stats for 6 of my 8 ntp servers:

vlan161-2-mini


NTP ECMP Clusters
#23

So how can we monitor the rsntp stats? Does the script found in old ntp-pool maillist article still applicable?


#24

Can you link to the old post? I googled but could not find anything.

What kind of stats do you want to monitor exactly? Packets, Bytes, Concurrent Connections?


#25

The original script: https://lists.ntp.org/pipermail/pool/2012-July/006049.html
It can only monitor ntpd packets, and that’s fine with me. Currently I am seeking for available methods to monitor rsntp before switching.


#26

I browsed over the rsntp source and it’s pretty minimalist. I don’t see any sort of stats or output that it can do. I don’t know how to program in rust, but if you could talk to the developer I don’t think it would be too terribly difficult to add some counters in for the packets sent / received and have some way to query that.

Alternately you could setup some rules in iptables and track packet count / byte count that way for some basic numbers and query with a custom script to input into some rrd based logging. You could probably do the same with tcpdump but it might be a little more cpu intensive. I’m sure there are other ways too.

I wish I had some scripts that would work, but I monitor my NTP sent/received packets via parsing the ‘ntpdc -c iostats’ command.

On a similar note that might give you some inspiration… Recently I was wanting to track IPs making NTP requests to my server. So I created a couple rules in iptables to log all incoming NTP, which with rsyslog I wrote to a separate file (as to not fill up my main syslog file). Well, the amount of traffic made this file grow very large very fast (because of all the data each entry contains), so that wasn’t going to work. All I really wanted to know was how many queries each IP made and just separating it by day was fine. So first I went to work and wrote a little script that was a basic datagram socket server. Rsyslog output the log entries directly to it. The script parsed the log entry and inserted it into a MySQL database. The table had the IP, hitcount, and date. Within about 30 minutes I was astonished at what I was seeing… Most IPs were okay, but there was a very small handful that already made hundreds, if not thousands of queries! For a protocol where a person should for the long-term (excluding iburst) one query per minute, I was getting a few IPs that were making continuous multiple queries per-second!

So from there I added a few rules & logs in my firewall to limit requests per-ip using ‘hashlimit’. Altering my DB table and script slightly I now logged how many requests per-IP were ‘accepted’ and how many were ‘dropped’. Again, a very eye-opening experience to see this small handful of clients being so abusive. Looking up some of the top offenders offered no insight as to a cause or common source. I did notice it was a little more common that the netblock would belong to various wireless carriers. Not just cell phones (there were a lot of china telecom), but embedded type devices (yay, which are probably hardcoded and firmwares never updated). One interesting block was Jasper Wireless - www.jasper.com - Coming from like half a dozen IPs in their netblock I can only assume they were testing devices or something, but why those devices needed to update at a rate that borderlines as a DOS attack, I do not know. They earned a permanent block in my firewall weeks ago, yet still seem to be querying away at tens of thousands of packets an hour… Anyhow, crunching some numbers I found that about 0.2% of the IP address making NTP requests were having a portion of their requests dropped (due to too frequent requests) yet those IPs were generating about 4% of the overall inbound NTP traffic.

Sorry, I kind of rambled… But my point is, where there’s a will, there’s a way… Adding in a couple rules to iptables, even if they have no action will still give you a good count of packets and bytes which you can query (look at the -x option for exact counts). You just need to make sure your rrd stats can handle the counter resetting (most have an option to enable that so the graphs don’t go bonkers).


#27

Jasper is a massive IoT product managed by Cisco. Loads of huge mobile carriers all around the world are using the platform. Quite possibly this is the NTP traffic from tens of thousands of end devices, and that will only grow as more and more SIMs get issued.

:-/


#28

… and yet their “Contact Us” page is broken. Guess they don’t need any more business?


#29

They’re planning to have a billion devices on their network, in partnership with the mobile carriers. Maybe that’s enough, even for Cisco?!