Getting beyond 10k qps?


#6

I just upgraded my singapore digital ocean droplet to a three core 1GB ($15/month). I also flipped it to run rsntp from @mlichvar (https://github.com/mlichvar/rsntp)

What’s interesting about singapore is how spiky the usage is:

image

I have the same 3 core setup in Bangalore that that really sees some traffic. I’ve seen 75k/qps peaks. and it’s handing about 2 billion requests a day (that was at 500M setting on ipv4, I just turned it up to the full 1G)

image

Both of these machine are not doing much beyond ntp. rsntp seems to work really well. Here is the CPU usage for the Bangalore machine for the same 24 hrs.

image

The only weirdness I saw was rsntp would not play nice with the floating IP setup Digital Ocean uses, not sure why, Regular NTP and Chrony both handle it fine.


#7

My guess is that the high-traffic parts correspond to times when your address was returned by DNS and the low-traffic parts are clients that remember the address (like NTP clients are supposed do). There is a small number of servers in the zone, so the duty cycle is large.

What sampling interval do you use? I use collectd with the iptables plugin using 1-second interval, which shows all spikes very clearly.

One difference is that rsntp may not respond from the same address as the request was received on. It relies on system routing configuration. Is it possible that the default route used a different address in the “floating” setup? I think a new option to bind the rsntp’s sockets to specified addresses would help with that.


#8

My guess is that the high-traffic parts correspond to times when your address was returned by DNS and the low-traffic parts are clients that remember the address (like NTP clients are supposed do). There is a small number of servers in the zone, so the duty cycle is large.

Probably true, I’m guessing with the Bangalore server I’m in the cycle pretty much 100% the time which is why it look more even. Those grabs are screen grabs from the digital ocean console - not sure what the sampling is, if I had to guess I’d say 1 min. Putting real monitoring on the boxes is one of my tasks for this weekend.

Regarding rsntp yeah I think that’s it, the wrong IP in the reply. I don’t think binding a specific address will help this, the address in question isn’t aliased to an interface. I’m not sure what legacy ntp and chrony are doing that’s different, rust isn’t once of my languages so I’ll probably punt for now.


#9

In order to respond with a correct source address, ntpd binds a socket to all local addresses and chronyd sets the address for each packet using the PKTINFO control message. I’m not sure how would I do the latter in Rust, so I added an option to rsntp to bind its sockets to a specified address if you want to try it. I think it should help.


#10

Quick update. My Bangalore server (3 core 1GB Digital Ocean for $15/month) is now handing peaks of 80k QPS set to gigabit on IP4 and IP6. The @mlichvar rsntp server works very well at these loads. Traffic is around 12 TB/month.


#11

Using rsntp, has anybody seen EINVAL/os error 22 errors? I get anywhere from 4-12 / hour, eg:

Jun 14 21:32:11 minime rsntp[6490]: Thread #2 failed to send packet: Invalid argument (os error 22)
Jun 14 21:32:31 minime rsntp[6490]: Thread #2 failed to send packet: Invalid argument (os error 22)
Jun 14 21:52:20 minime rsntp[6490]: Thread #2 failed to send packet: Invalid argument (os error 22)
Jun 14 21:52:40 minime rsntp[6490]: Thread #2 failed to send packet: Invalid argument (os error 22)
Jun 14 22:22:12 minime rsntp[6490]: Thread #2 failed to send packet: Invalid argument (os error 22)
Jun 14 22:22:12 minime rsntp[6490]: Thread #2 failed to send packet: Invalid argument (os error 22)
Jun 14 22:54:01 minime rsntp[6490]: Thread #2 failed to send packet: Invalid argument (os error 22)
Jun 14 22:54:21 minime rsntp[6490]: Thread #2 failed to send packet: Invalid argument (os error 22)
Jun 14 23:05:21 minime rsntp[6490]: Thread #1 failed to send packet: Invalid argument (os error 22)
Jun 15 00:10:39 minime rsntp[6490]: Thread #3 failed to send packet: Invalid argument (os error 22)

Everything seems to be working okay, so I’m trying to figure out if it’s a bad/bonus source IP in a packet or if for some reason some of the time data in the buffer is wrong… I’m guessing IP…


#12

@mlichvar cc the post above:


#13

I get that too and I think your guess is correct. It’s probably trying to respond to an invalid IP address. If you pull from git, the error message will include the remote address.


#14

Awesome, thanks! That’ll definitely help debug the errors!


#15

How do you change port in ntpd so it listens on port 11123 that rsntp reads from?


#16

I don’t think you can change the port in the standard ntpd distribution…


#17

I don’t think you can change the port on ‘ntpd’, but you CAN use the -I option when starting ntpd to make it bind only to 127.0.0.1 (on 123), Then when you start rsntpd, you do something like:

-a 192.168.1.5:123 -s 127.0.0.1:123

Change the IP in -a, obviously :slightly_smiling_face:


#18

Thanks, I got it to work but I have to use one more public IP that ntpd itself can use.
In ntp.conf I changed from
interface listen mypublicip-1
to
interface listen 127.0.0.1
interface listen mypublicip-2
interface ignore wildcard
and then I start rsntp with
rsntp -4 4 -a mypublicip-1:123 -s 127.0.0.1:123 &

First I have to say that rsntp clearly works better than ntpd and can take much larger load :grinning:
However some issues @mlichvar :

Question 1

It doesnt work to use IPv4 and IPv6 at the same, it just dies even that it says that it starts?:
$ rsntp -4 4 -6 4 -a 193.228.143.22:123 -b 2a03:8600::dd:123 -s 127.0.0.1:123
thread ‘main’ panicked at ‘called Result::unwrap() on an Err value: AddrParseError(())’, libcore/result.rs:945:5
note: Run with RUST_BACKTRACE=1 for a backtrace.
Server thread #1 listening on V4(193.228.143.22:123)
Server thread #2 listening on V4(193.228.143.22:123)
Server thread #3 listening on V4(193.228.143.22:123)
Server thread #4 listening on V4(193.228.143.22:123)

Question 2

It says that “Address already in use” even that it isnt and it works anyway?:
$ rsntp -4 4 -a 193.228.143.22:123 -s 127.0.0.1:123
Server thread #1 listening on V4(193.228.143.22:123)
Server thread #2 listening on V4(193.228.143.22:123)
Server thread #3 listening on V4(193.228.143.22:123)
Server thread #4 listening on V4(193.228.143.22:123)
thread ‘’ panicked at ‘Couldn’t bind socket: Address already in use (os error 98)’, src/main.rs:298:23
note: Run with RUST_BACKTRACE=1 for a backtrace.

Question 3

My servers receive approx 3000-4000 KB/s but just sends 1000-1800 KB/s.
Where did the other traffic go? It become little better with rsntp but
still alot of traffic that doesnt get a reply. I use rsntp with 4 cores.

Question 4

Did you know that it stops processing requests if you start it with & and then logout?
It doesnt die it just stops processing.


#19
  • Q1: The IPv6 address specified by the -b option needs to be in brackets, e.g. [2a03:8600::dd]:123
  • Q2: Something is probably listening on the IPv6 port 123. ss -u -6 -l -p should print what it is.
  • Q3: Maybe it is still too much for the CPU? If it’s fully loaded, the receive queue might be full and the packets getting dropped before rsntp can get them.
  • Q4: Interesting. Maybe redirecting the output from rsntp to /dev/null would prevent that?

#20

Personally I just made a systemd .unit file and ran it under systemd:

[Unit]
Description=Rust NTP Server
After=network.target

[Service]
Type=simple
Restart=always
ExecStart=/usr/local/sbin/rsntp --ipv4-threads 3 --ipv6-threads 3

[Install]
WantedBy=multi-user.target


#21

Q1: Works, thanks.
Q2: Correct, “interface ignore wildcard” doesnt apply to IPv6 but if I start ntpd with -4 it starts without IPv6.
Q3: Looked up what the traffic was and its my co-los router that sends ECMP traffic to hosts it shouldnt.
Q4: That worked :grinning:
Q5: rsntp seems a bit fragile for the first few minutes when it just has been started, it doesnt happend offen but it has happend that it just stops processing requests without any message. I restart it until its
stable.


#22

Now I have tested rsntp for over 2 months and I must say that its very stable and use resources better than ntp. I recommend it for others that have a need to reply to a few billion ntp requests.

Packet stats for 6 of my 8 ntp servers:

vlan161-1-mini

Traffic stats for 6 of my 8 ntp servers:

vlan161-2-mini


NTP ECMP Clusters
#23

So how can we monitor the rsntp stats? Does the script found in old ntp-pool maillist article still applicable?


#24

Can you link to the old post? I googled but could not find anything.

What kind of stats do you want to monitor exactly? Packets, Bytes, Concurrent Connections?


#25

The original script: https://lists.ntp.org/pipermail/pool/2012-July/006049.html
It can only monitor ntpd packets, and that’s fine with me. Currently I am seeking for available methods to monitor rsntp before switching.