rsNTP server + multihomed (more than one IP) host. Server only answeres first IP. Please help

Hi,

I am using rsntp to spread the load of the NTP queries to more than one CPU core, since with one core ntpd or chrony are not up to the task and drop packets.

rsntp runs fine, but only answers requests on one IP address. I can not get it to answer on all interfaces.

https://github.com/mlichvar/rsntp

Do you have any ideas on how I could solve this?
How do you do it with high load NTP servers > 500mbps actual bandwidth requirement?

Are you sure it’s the CPU and NTP-deamon that are dropping packets?

How do you know?

Maybe this helps on getting multiple IP’s on 1 lan port:

And you probably need a routing to make a sort of round-robin.

How many thready are you using?
mlichvar (aslo registered here) was using a E5-1220 CPU (4 cores) and a 1Gb/s Intel I350 with 4 threads and archives nearly full 1GB/s

rsntp --help
Options:
    -4, --ipv4-threads NUM
                        set number of IPv4 server threads (1)
    -6, --ipv6-threads NUM
                        set number of IPv6 server threads (1)
    -a, --ipv4-address ADDR:PORT
                        set local address of IPv4 server sockets (0.0.0.0:123)
    -b, --ipv6-address ADDR:PORT
                        set local address of IPv6 server sockets ([::]:123)
    -s, --server-address ADDR:PORT
                        set server address (127.0.0.1:11123)
    -u, --user USER     run as USER
    -r, --root DIR      change root directory
    -d, --debug         Enable debug messages
    -h, --help          Print this help message

It should listen on all addresses

rsntp doesn’t set the source address for its responses. As a workaround, you can start multiple instances bound to different addresses specified by the -a and -b options.

I’d suggest to just run multiple chronyd instances, as described in the FAQ. In my tests on a 4-core CPU, 4 chronyd instances can handle the full speed of a 1Gbit interface (1100 kpps). It needs about 20-30% more of the CPU than rsntp, but it is more accurate and has all the NTP features.

2 Likes

Just went over the source of Chronyd and there is a bit of a catch.
If you want it to use multiple instances by itself, you MUST run it in Daemon mode.
If you run it in forground or debug of whatever screen output, it will not run more then 1 instance of itself.

Chrony does it by itself if run in background, as it should by default when installed.

What error do you see?

I don’t think foreground or background matters. The different instances just need to use different pidfiles, driftfiles, Unix sockets, etc. As a test, you can try:

# chronyd -d 'pidfile /run/chrony-client1.pid' &
# chronyd -d -x 'pidfile /run/chrony-server1.pid' &
# chronyd -d -x 'pidfile /run/chrony-server2.pid' &

The three instances should all start and keep running.

Yes it matters.

If you any of the commandline parameters like -d (debug) or print output, then it will set nofork to 1 in the source.
If you run it without any parameters of such, it will fork.

Also, I would suggest to set the -m parameter as it lock chrony into ram and can’t be paged to swap.

Here it happens in the code:

case 'd':
        debug++;
        nofork = 1;
        system_log = 0;
        break;

and

case 'p':
        print_config = 1;
        user_check = 0;
        nofork = 1;
        system_log = 0;
        log_severity = LOGS_WARN;
        break;

and

case 'q':
        ref_mode = REF_ModeUpdateOnce;
        nofork = 1;
        client_only = 0;
        system_log = 0;
        break;
      case 'Q':
        ref_mode = REF_ModePrintOnce;
        nofork = 1;
        client_only = 1;
        user_check = 0;
        clock_control = 0;
        system_log = 0;
        break;

If you run with either -d -p -q or -Q, it will not fork, else it does.

That is true, but I don’t see how it is relevant to running multiple instances. You can run them all with or without the -d option. It does not matter. They can be started manually from the terminal, or they can be started as system services with systemctl.

It will not enter the go_daemon section where it communicates to the grandparent.

You will probably end up with multiple instances where they all need a different port for access.
You need the nofork to be 0.

Or I’m completly lost, can also be the case :slight_smile:

Hey Guys,

Make sure when you setup the multiple addresses on an interface, do it correctly. You may be experience asymmetrical traffic between the IP addresses. I was having issues with running multiple ipv6 addresses on an interface and the IP’s being in the same collision domain were seeing asymmetrical traffic patterns, and NTP UDP traffic doesn’t like it.

You could see the traffic with a tcpdump. Same thing can happen with ipv4 addresses too. Best to keep them separate networks if you do it at all.

Unless you are forcing NTP to live on specific interfaces, it natively binds to all IP’s on all interfaces, and that’s when you start to see weird stuff happen.

Please elaborate on that.
Many of my boxes use a 1 or 10Gig NIC with quite some IPs assigned to it. Why is it a problem to let chrony bind to all interfaces? I don´t care if it listens on the internal interface or on loopback, or should I care to avoid “weird stuff happening” ?

Guess you have typical multi-wan missconfiguration problem and it will not depend to service type (NTP, SSH, etc.). If it’s true you need to build policy based routing logic to bind connections to interfaces and let packets with answers go directly to interfaces where connection bind with.

I think there may two different issues conflated. One is that NTP clients won’t accept a response coming from a different IP address, so NTP servers need to specify the source address for each response (like chronyd does) or use multiple sockets bound to the individual addresses of the server (like ntpd does).

The other issue is that NTP works better with symmetric routing. An NTP server should respond on the interface which received the client request. That may not be possible in some configurations. chrony versions 3.4–4.0 specified the outgoing interface in addition to the source address, but that completely broke the server functionality on some systems using asymmetric routing and there doesn’t seem to be an easy way to detect this case. Later versions let the kernel select the interface again. One way to restore the previous behavior might be running multiple chronyd instances bound to the individual network devices (using the binddevice directive).

I didn’t mean to indicate binding to all interfaces is the problem. In fact you want 127.0.0.1 localhost to
serve up time inside the kernel to local applications. It is when you have multiple IP’s in the same network, that can cause the asymmetrical routing of packets. @mlichvar stated it pretty well from the clients perspective. This was the issue I was having.

Unless you are doing some sort of hosting that requires you to use multiple IP’s on the same physical interface and including sub interfaces I don’t see the need for it. and avoids the “weird stuff” I was talking about.

I used to use IP’s to separate out instances of https websites on the same host, but don’t do it that way anymore since apache2 and openssl can host vhosts without binding to separate IP’s on the same interface. All the FQDN’s resolve to 1 address.

This keeps it simple for me. You may have a lot more complex environment than I require. I also use iptables and ip6tables for my stateful firewalls and force traffic initiated state -m state directives to not allow asymmetrical traffic patterns. So you have to be mindful with your firewall too.

I don’t know of anybody here actually running an NTPD or Chrony instance wide open on the Internet. So most people will be going through a firewall of some type.

I am small outfit with 5 servers and 3 NTP network appliances for a total of 5 ipv4 instances and 8 ipv6 instances all instances are allowed 1Mbit traffic setting for each pool instance. The 5 servers are dual IP stack, and the appliances are dedicated ipv6. 99% of the time unless the ntp pool Poller take a network hit, all my instances run at a 20.

The Symmetricom appliances allow me to hook up a serial port through USB to Serial converters and elevate the servers to stratum 1 for maximum stability. I use NTPD with the truetime clock driver to accomplish this.