Another NTP client failure story


#1

I came across this funny story https://strugglers.net/~andy/blog/2018/12/24/the-internet-of-unprofitable-things/


#2

Sadly there are probably lots more out there that have done the same thing.

Right now I have a branch of Cisco that makes IoT thingies that appears to be pounding my NTP server with requests from a very specific netblock. When you log it based on IP they stick out like a sore thumb, but compared to the overall amount of requests it’s not worth getting in a huge battle over. I’ve tried contacting them with no luck… Every now and then I block that netblock hoping someone will raise an eyebrow and wonder, but maybe they went one step further and hard-coded multiple IPs or something… Queries don’t let up even when blocked, so definitely hard-coded and definitely not a well behaved client.


#3

Jasper Technologies and 204.156.180.0/22?

I don’t think they’ve hardcoded anyone’s IPs – I think they’re doing normal DNS lookups and slamming the entire Pool that hard.

My hope(?) is that they actually have like a billion devices behind a small NAT block.


#4

Yes, that’s them! That’s the netblock!

Even when blocked (for weeks) they keep hitting my IP persistently…

We’ve had Netgear, SMC, D-Link, TP-Link, and Snapchat all make “coding mistakes”… Seems only right Cisco gets tossed in the mix…


#5

Lurker here in the community from Network Time Foundation. Love to know what part of Cisco this is so I can go chase them down. I work for NTF, and these are opportunities to find funding for the Pool and for NTF.

IMHO, these huge orgs need to be reminded of the rules, and especially that they should be supporting the NTP Pool (and NTF) to keep the service available and scaling to the needs of users (IoT- OMG!)

Steve Sullivan


#6

It’s Jasper Technologies – it was a company acquired by Cisco in 2016. Their website now uses “Cisco Jasper” branding. (I only use their old name because they haven’t updated their IP addresses’ whois information since 2015.)


#7

Still a candidate for their own Vendor Zone for the DNS though, right?


#8

At the very least, yes… You might be able to not only talk Jasper into one, but Cisco too (I don’t know what their routers and such use as default).

Though I’m curious what on earth is making all those queries, if it’s just some IoT things or if it’s more, and if they are even aware of it.


#9

I was put in contact with someone at Cisco Jasper today. I want to have a conversation with him in the next week to discuss the NTP Pool and discuss Vendor Zones.


#10

If it’s all coming from one netblock then presumably it would be easy enough for them to make a local NTP server. If the devices are hard-coded with pool.ntp.org then they would have to spoof the DNS queries but still, they should be able to do it somehow.


#11

That’s good news! Please keep us updated, I’m curious to hear what is going on within their network.


#12

Hi there Pool!

I made contact with Cisco Jasper.

Cisco Jasper customers are Automotive manufacturers using the Cisco Jasper network to connect automobiles and mobile devices for internet service and diagnostics. Endpoints could be using any local time-source, so that is not an issue that Cisco Jasper can control.

They asked if we can provide how much traffic you guys are seeing (report form to share internally). This would be used to float the idea of and prove the need for Vendor Zone within Cisco Jasper.

I must point out, my contact said he has been using this for over 3 years and this is the first he has heard about Pool problems and the availability of the vendor zone. I think much of the time that you guys get slammed, it is because of a lack of awareness about the pool and how to use it. With IoT coming into its own, the slamming will increase otherwise:).

Can someone provide the data Cisco Jasper requested and I will start the process of negotiating a vendor zone with them?

Also, as an aside, My contact at Cisco Jasper says that Cisco uses internal time sources for their own employee equipment (desktops, switches, etc). It would be helpful for me to know if that is indeed the case from your network analysis perspectives.

Steve


#13

Well, as an example, just now I ran tcpdump on one of my Pool servers – US zone, speed setting 500 Mbps – and got 1000 packets from 204.156.180.0/22 in 43 seconds.

And I dropped almost all of them. Using default rate limiting settings, I’m pretty close to a black hole for them. I wonder if they even get usable time from the Pool.

Or:

$ nice ntpq -c "mrulist mincount=1300000 sort=-count"
Ctrl-C will stop MRU retrieval and display partial results.
Retrieved 31 unique MRU entries and 0 updates.
lstint avgint rstr r m v  count rport remote address
==============================================================================
     1      1  3f0 L 3 1 4641911 19375 [redacted]
    12      0  3b0 . 3 4 3650010 33353 [redacted]
     0      1  3f0 L 3 3 3429535 55406 [redacted]
     1      1  3f0 L 3 4 3284233 34048 204.156.180.176
     7      1  3f0 L 3 4 3280771 42550 204.156.180.182
     2      1  3f0 L 3 4 3267119  6491 204.156.180.183
     8      1  3f0 L 3 4 3264288 13700 204.156.180.177
     2      1  3f0 L 3 4 3254222 53455 204.156.180.179
     2      1  3f0 L 3 4 3251573 55287 204.156.180.181
     2      2  3f0 L 3 3 3231497 56611 [redacted]
     3      1  3f0 L 3 4 3230892 10878 204.156.180.180
     2      1  3f0 L 3 4 2902637 15685 204.156.180.178
     1      2  3f0 L 3 3 2832740   495 [redacted]
     2      2  3f0 L 3 3 2311053 15478 [redacted]
     1      1  3f0 L 3 3 2253325 46383 [redacted]
     0      1  3b0 . 3 4 2102937 18590 [redacted]
     1      2  3b0 . 3 4 1957184 34323 204.156.180.125
    25      2  3f0 L 3 4 1927618  7179 204.156.180.120
    22      2  3b0 . 3 4 1905523   123 204.156.180.126
    10      2  3b0 . 3 4 1896736 13001 204.156.180.124
     0      1  3f0 L 3 3 1759819  7475 [redacted]
     7      2  3b0 . 3 4 1542396 36363 204.156.180.122
     9      2  3b0 . 3 4 1496631   123 204.156.181.123
     2      2  3f0 L 3 4 1482457 17984 204.156.181.126
     1      2  3f0 L 3 4 1468126 10073 204.156.181.120
     8      2  3f0 L 3 4 1451252 26659 204.156.181.121
     0      2  3f0 L 3 4 1445748   123 204.156.181.125
    23      2  3b0 . 3 4 1436378   123 204.156.181.127
     2      3  3f0 L 3 4 1352490 46048 [redacted]
     5      2  3f0 L 3 4 1332150 45026 204.156.180.127
    14      2  3b0 . 3 4 1301119   123 204.156.180.121

#14

My server in the US zone received from the 204.156.180.0/22 network about 90 million requests in the last three weeks. That’s about 50 packets per second on average. It has enabled rate limiting and it responded to more than 90% of the requests. To me it looks like a larger number of clients behind NAT, rather than a small number of broken clients.

On servers in few other zones the rate of requests from this network seems to be much smaller.


#15

Are they routing all these devices through their own network?

I posted some stats in another thread, I was getting around 500k/day from their netblock, US server 100Mb setting… Which equates to around 5-6 a second, which is about 1/10th of @mlichvar and that makes sense as he’s at 1Gb setting for the pool. Likewise for @mnordhoff that was at 23/s @ 500Mb (granted his sample time was just those 43 seconds, but it is still close enough to mlichvar’s to confirm overall rate).

If we are to extrapolate from a chart mlichvar posted in another thread, that a single 1Gb server handles 0.95% of the US traffic, and they are querying the whole pool equally (which appears to be the case)… So they are hitting just the US pool at: 50 / 0.95% * 86400 = 454,736,842 queries per day…

Likewise from the chart, the US pool receives about 389,474 queries per second, which puts just Jasper at using about 1.4% of that total usage (assuming they are only querying the US pool). If they are querying the global pool, which receives about 509617 QPS, then that number drops down to 1%… I estimated 1% from my own server stats, which I do receive traffic from all over the world, so that would fall in line with the above estimates…

One company, creating 1% of the total GLOBAL pool traffic… Does the pool serve only 100 clients? No… It serves hundreds of millions…

Last time I posted numbers my server (@ 100Mb) was seeing ~2.4 Million unique IPs a day… A 1Gb server would theoretically see about 24 Million (Maybe @mlichvar has stats he can post on daily unique IPs on his servers) … I’m not sure the 0.95% would hold true as that would be about 2.5 Billion IPs, which is over half the IPv4 address space. There would be a fair amount of overlap as the same devices would be querying multiple servers in a day (either regular polling via NTP, or some regular polling interval doing SNTP)… If you divide by 5 (as a rough guess of how many servers one IP might query in a day), you get around 500 Million IPs which is more believable from the existing numbers.

The only reason it’s not a DDoS is because the queries are coming from a relatively small group of IPs. But as I pointed out they are creating a very noticable amount traffic. Any NTP server which has the ‘limited’ statement in its default configuration (which is probably every one), is dropping packets from the Jasper 204.156.180.0/22 netblock. I believe the default ‘discard’ average query rate is 8-seconds (which is more than generous for how NTP should operate). Their query rate per-IP is way too high, plain and simple. Even if traffic continues to grow from those IPs, the only thing that is going to happen is more packets are going to get dropped. That’s assuming the pool servers are using the regular NTP distribution, I know nothing about Chrony, but I’m going to assume it’s very similarly configured / rate limited. Also, it’s not all the ~1,000 IPs in Jasper’s netblock, last time I logged it was around only 140 IPs creating all the traffic…

I just re-enabled logging this morning… I’ll let it run for a day and PM you some specific IPs and traffic numbers. That might help determine exactly what is going on.

*Note: Any mistakes or things that don’t make sense above is because I’m still drinking my coffee this morning… lol.


#16

Hi Folks,

Thanks Matt and Jason for the info. I supplied it to my Cisco Jasper contact and he is taking this NEW (to him) info to the powers that be to start a discussion about a vendor zone. I will follow up with them next week.

Anyone on this channel know David Dai professionally/personally?

I think the ROOT of the problem here is that these Orgs are completely unaware of the Pool TOS and rules, and sometimes perhaps conveniently unaware. With 2 HUGE Orgs now (CISCO and VERIZON) I have to assume I am the first person to show them the Vendor Page on the Pool Site and have this discussion with them. That’s an opportunity right there - Increase awareness to get funding and everyone wins!


Thank you very much Steve. I will take a look and sync up internally and get back to you!!

Regards,

David Dai

Network Engineering | Cisco IoT BU


#17

I got this response from Cisco:

It seems that it was caused by a software bug on their system and will be fixed shortly. Here is the response from them last week:

“See below explanation for NTP behavior. This should be corrected next week.

The infotainment computer syncs its clock to the set {0.pool.ntp.org, 1.pool.ntp.org, 2.pool.ntp.org, 3.pool.ntp.org}, starting at boot time, and then based on clock drift (standard ntpd behavior).
The autopilot computer, at boot, tries to sync with the infotainment computer, but if it can’t for some reason, it tries pool.ntp.org. It continues once every 10 seconds until it syncs successfully, then syncs again once every hour.
_ _
There is also a bug that sends extraneous NTP requests every time an intel-based infotainment computer gets a cell or wifi connection. We’ve already disabled those requests, see SW-167232. This should be rolling out to the fleet in the next week.

So my educated guess is that the internet check used for the cell interface started failing for some cars, then the cell reconnects cascaded into many NTP requests across the large number of cars behind the small number of PAT IP addresses”

I am waiting for @ask to reply to me on email regarding Cisco’s request to see if the traffic has dropped significantly.

Is this something I should instead request here?

Steve


#18

I’ll try to remember to turn back on logging on my server for a couple days and will let you know if the Jasper/Cisco queries have gone down or not…


#19

Thanks @littlejason99. Does anyone keep logs of DNS queries we could look at for this situation?

Even if the traffic has dropped Cisco still needs a vendor zone, so, that’s the next step.

Feel free to hit me at stevos@nwtime.org. I may forget to login here tomorrow, and i don’t want to miss the data.

Steve


#20

If this is from cars and it’s making one request every 10 seconds, through their netblock (implying their cell service) somebody is going to have a very big bill from their cell provider.

More to the point it’s in a car, don’t they all have GPS these days, Why are they using pool when they have a local stratum 1?

Another thought. There is probably a class action settlement to be had against vendors who violate the pool TOS.