Is there a Chrony equivalent for 'ntpq ifstats'?

Not strictly pool related but I;m hoping the knowledgable folk here may have some ideas…

I operate two public servers (both are in the pool). One runs NTPD the other uses CHRONYD. NTP has a nice command ‘ntpq ifstats’ that provides NTP request/response stats for each server interface / IP address combination. By sampling this every hour I can get a really good picture of the traffic per network including the IPv4 / IPv6 split. I’ve not been able to find anything similar for chrony though; am I missing something or does chrony simply not provide this kind of stats? Any thoughts on a (lightweight, simple to automate) mechanism to capture similar stats on the server that runs crony?

The approach I take is to use iptables. I have a bunch of rules that match NTP on each address that my servers are listening on, and iptables then keeps track of how many packets hit each rule:

Chain ntp_in (1 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0            all  --  any    any     anywhere             sntp.cam.ac.uk       /* sntp */
2442K  186M            all  --  any    any     anywhere             ntp0.cam.ac.uk       /* cam */
 636M   48G            all  --  any    any     anywhere             pool.ntp0.cam.ac.uk  /* pool */
 583K   43M ntpcsx     all  --  any    any     anywhere             please-update-config-to-use.ntp0.cam.ac.uk  /* csx */
 244K   19M DROP       all  --  any    any     anywhere             anywhere             limit: above 20000/sec burst 20000 mode dstip

In my case, these numbers are then collected by CollectD and fed into our general metrics system, but you could doubtless feed them into whatever you’re currently doing with ifstats.

1 Like

Thanks, this seems promising. This server (cloud VM) uses firewalld, which I believe uses iptables, so I just need to figure out how to to set up any extra rules needed and how to view the stats. I feel a new project coming on…

Check out this thread Monitoring Script provides very nice stats.

1 Like

@csweeney05 Thanks. I took a look and actually deployed it on my cloud system. However it (a) doesn’t; break out IPv4 versus IPv6 traffic and (b) it doesn’t allow you to show only NTP traffic (i.e. traffic to/from port 123). Since this system hosts other things besides the crony server it doesn’t really give me what I am looking for unfortunately.

Running a time server inside a VM is not really recommended…. just sayin’

Not true anymore today, at least generally speaking.

1 Like

Perhaps @stevesommars has thoughts on this (he looked into this), since quite a few pool servers are virtual these days and appear to perform reasonably well - at least within ‘pool standards’.

2 Likes

Really not true these days (and hasn’t been true for several years). Provided things are set up properly and adequately resourced then a VM is fine. This cloud hosted VM has 4 ARM64 cores, 32 GB RAM, 256 GB NVMe storage and 4 Gbit/s internet bandwidth and will give many a ‘real’ machine a good run for its money. If you didn’t know that it was a VM you would find it hard to tell.

3 Likes

I don’t know what “pool standards” means. If an NTP server returns a Mode 4 response where at least one of the two timestamps is error (taking in account root delay & root distance), then it is a falseticker, at least for some period. By that criteria many NTP servers are occasionally falsetickers. The NTP monitoring system identifies persistent falsetickers and removes them from the pool Transient falsetickers tend to remain in the pool.

I’ve written a few guest articles on Johannes Weber’s blog One covers time inaccuracies on commercial Microsoft Azure VMs. I’m writing another VM-related article now.

2 Likes

Sidenode: AWS provides a PTP SHM for EC2 which can be access by chrony :slight_smile:

1 Like

Doesn’t it create too much jitter/delay as it has to go through two network stacks, one of the VM guest itself and the other of the host server?

Two aspects:

  • Does it add jitter/latency?
  • If jitter/latency is being added, is it sufficiently large, when compared to other factors, such as actual external network transit jitter/latency, that it materially changes the overall performance?

On the first, I don’t think anything noticeable is added with modern systems. Unlike the phrasing “it has to go through two network stacks” might suggest, it is not really two full network stacks. Details obviously depend on, and may be different, depending on the virtualization implementation. But in many cases, the virtual network card in the guest is not fully emulating an actual hardware NIC, but actually aware of its own virtualized nature (just like, e.g., the block storage, or memory management). I.e., packets don’t go up one entire network stack, then down again that same one, to then enter and go up a second network stack. Rather, at least the lower parts of the stack are entirely on the host system, and only the upper parts are actually in the guest. Very roughly like the separation that also exists on a bare metal instance, where the packet enters, and is initially handled within the kernel, before it is passed to the user-space time daemon.

What is more of a challenge is the typical resource sharing nature of virtualization. I.e., does the guest running the time daemon get the resources it needs in a timely fashion. On a well-managed system, that shouldn’t be a problem. On over-committed systems, it may become one.

On the second item, are any jitter/delay incurred relevant in the grand scheme of things? In many cases probably not. E.g., “pool standards” to me means potentially some network distance between a client and the server. In those cases, unless the virtualization is somewhat poor, e.g., high over-commitment, the jitter/delay incurred on the external network would by far outweigh anything that is going on between a virtualized server and its host.

On the other hand, if one needs to synchronize to high accuracy on a local, high-speed unloaded network, it might become more relevant. But then, maybe NTP isn’t even the right protocol anymore, if the requirements are really high. Or at least maybe even a bare-metal but general-purpose server isn’t good enough anymore, and one needs a specialized hardware device with external reference and high-accuracy oscillator.

Now, I don’t have any specific numbers, or statistics, but I think the references mentioned previously will, or will have references to such data. And my description above is based on my take-away from sources such as the above

It very much depends on the cloud infrastructure, hypervisor, resource allocation/sharing etc. My Cloud VM server seems to be behaving just fine as far as I can see. I know that Oracle Cloud paid a lot of attention to those things in the design (I used to work at Oracle). Can’t speak for other clouds but I imagine they did too.