NTP client on gateway not able to receive server response on a specific interface

Hello,

This is my first post in this forum, thanks for adding me to the group.

We have the below NTP client version running on all our gateway’s.
ntpd - NTP daemon program - Ver. 4.2.8p8

Configurations on the gateway:

  1. 16 VLAN interfaces (L3) - VLAN 1 (IP address 10.16.48.23) is the only uplink and rest all VLAN’s are downlink.
  2. Uplink interface (VLAN 1) is expected to sync time from NTP server “216.12.243.109” and downlink interfaces have NTP time serve enabled to help clients (phone, servers, printers) sync its clock.
    /mswitch/bin/ntpd -g -L -I 10.16.48.23 -I eth1.2 -I eth1.3 -I eth1.4 -I eth1.5 -I eth1.6 -I eth1.7 -I eth1.8 -I eth1.9 -I eth1.10 -I eth1.11 -I eth1.12 -I eth1.13 -I eth1.14 -I eth1.15 -I eth1.16
  3. /etc # cat ntp.conf
    server 216.12.243.109
    driftfile /etc/ntp.drift
    restrict default nomodify noquery
    restrict 127.0.0.1
    restrict -6 default nomodify noquery
    restrict -6 ::1
    tinker panic 1000
    enable mode7
    server 127.127.1.0
    fudge 127.127.1.0 stratum 10

Issue:
The gateway was stable for around 6-7 months, then for some unknown reasons VLAN 1 interface stopped receiving NTP server response packets. This has caused the NTP client to point to local clock.

Logs:
NTP Server Table Entries

Flags: * Selected for synchronization

       + Included in the final selection set

       # Selected for synchronization but distance exceeds maximum

       - Discarded by the clustering algorithmn

       = mode is client

remote local st poll reach delay offset disp

*LOCAL(0) 127.0.0.1 10 64 377 0.00000 0.000000 0.03049

=216.12.243.109 192.168.1.222 16 1024 0 0.00000 0.000000 3.99217

/tmp # /mswitch/bin/ntpdc
ntpdc> listpeers
client LOCAL(0)
client 216.12.243.109

/tmp # ntpdate -d 216.12.243.109
4 Jun 10:26:03 ntpdate[18111]: ntpdate 4.2.8p8@1.3265 Tue Jul 12 06:29:04 UTC 2016 (3)
Looking for host 216.12.243.109 and service ntp
216.12.243.109 reversed to Loopback0.core0.dal0.wayport.net
host found : Loopback0.core0.dal0.wayport.net
transmit(216.12.243.109)
receive(216.12.243.109)
transmit(216.12.243.109)
receive(216.12.243.109)
transmit(216.12.243.109)
receive(216.12.243.109)
transmit(216.12.243.109)
receive(216.12.243.109)
server 216.12.243.109, port 123
stratum 2, precision -21, leap 00, trust 000
refid [216.12.243.109], delay 0.05338, dispersion 0.00003
transmitted 4, in filter 4
reference time: e464dffd.e6430be5 Fri, Jun 4 2021 10:23:09.899
originate timestamp: e464e0e9.f61f8d15 Fri, Jun 4 2021 10:27:05.961
transmit timestamp: e464e0b1.b49bc3d2 Fri, Jun 4 2021 10:26:09.705
filter delay: 0.05350 0.05339 0.05338 0.05350
0.00000 0.00000 0.00000 0.00000
filter offset: 56.24201 56.24199 56.24195 56.24196
0.000000 0.000000 0.000000 0.000000
delay 0.05338, dispersion 0.00003
offset 56.241953

Closer look at the iostats reveal that the NTP client isn’t receiving any incoming packets on VLAN 1, the “received packets” field is zero.

ntpdc> iostats
time since reset: 84776
receive buffers: 10
free receive buffers: 9
used receive buffers: 0
low water refills: 1
dropped packets: 0
ignored packets: 0
received packets: 0
packets sent: 195
packets not sent: 0
interrupts handled: 84786
received by int: 10

Linux stack reveal no packets stuck or drops in UDP receive queue, the following commands were used to analyze at time of the issue:

  1. netstat -s -u
  2. ss -u -a -e
  3. tcpdump -i eth1.1 udp port 123
  4. cat /proc/net/udp

Workaround - Restarting the NTP client solved the issue. We aren’t aware what’s the trigger for this issue and a large number of our gateway’s are affected currently.

Thanks in advance for your help.