I’m investigating a strange phenomenon we’re seeing with a fleet of IOT devices. We’re seeing significant time jumps on the system clocks. These jumps can be minutes but some are months. The erroneous time never lasts long (estimated at most an hour but usually a matter of minutes). Across a fleet of approximately 2000 devices in diverse locations this occurs perhaps once a week.
We have very little evidence to explain why this is happening. However the system log does show the SNTP client updating the system clock at both the moment the clock jumps to the wrong time, and then back again. The client is set to the default NTP pool for base image as provided by our supplier: [0,1,2,3].debian.pool.ntp.org.
Obviously we’re investigating software / hardware issues on our side, but having spent some weeks investigating this I’d like to get a word on the reliability of the Debian pool. As I understand it SNTP is much less fault tolerant than NTP. How likely is it that we receive sporadic incorrect times from one of these servers? Here “incorrect” means more than a minute out.
Does anyone have any other suggestions about possible causes, or ways we can guard against this? We have already spotted instances of home routers DHCP re-configuring the timeserver and we’ve now overridden this as a precaution. Is anyone aware of any ISP messing with ntp.org DNS domain names or intercepting NTP packets?
All suggestions welcome.