Feedback: Why is my NTP Server costing me $500 per year?


#1

Hello fellow Server operators,

If you have time & are interested, could you please provide feedback to my post describing my experience running servers in the NTP Pool? Why Is My NTP Server Costing $500/Year? Part 3.

I’ll attribute all suggestions, corrections and improvements.

Quick summary:

  • It costs $750/year to run a 1Gbps NTP server in US on GCE
  • ~3k/pps
  • 10% queries originates from AWS Cloud
  • For US NTP server, the bandwidth used by the pool is 2% (e.g. on a 1Gbps connection, it’s 20Mbps). It’s much lower in Singapore and Germany.
  • The Snapchat incident added ~150GiB NTP responses, and cost me $13 (AWS) and $18 (GCE)

Ask already corrected the way the pool’s DNS works (i.e. it’s not round-robin), and I’ll update the post shortly.

Thanks in advance,

—Brian


#2

Hi Brian,
I salute you for the amount of research it took to research and write the article - it contains a lot of fine details but it did not answer simple questions I was looking for.
Is this commercial server or a hobby?
Is NTP server the sole purpose of setting these VMs?
Is it good or bad that it costs $750/year?
Are you encouraging more NTP server owners to follow in your footsteps or warning them to think twice?
Do you want your traffic to increase or drop?
Was Snapchat incident a good thing or a bad thing?
Cheers
Leo


#3

Thanks Leo — that’s great feedback. Your points are far-reaching, and it will take a bit of work to weave them into the post. In the meantime, I’ll answer them directly now for you:

  • “commercial or hobby?”. Hobby.
  • “NTP server sole purpose?”. No. They’re also DNS servers with PowerDNS back-end. One (GCE) is also a Concourse continuous integration server; however, the NTP traffic dwarfs the other traffic.
  • “750/yr — good or bad?”. Bad. As the year moves on, I’ll migrate some of the NTP workload onto a cheaper VM (probably DigitalOcean). I like AWS & GCE, but their data transfer costs are much more than some alternatives.
  • “encouraging or warning?” Both. I think I’ll try to call that out more. And encourage people to sign up for the pool.
  • “increase or drop?”. I’d like to see the traffic drop. It’d be nicer to have more people share the burden.
  • “snapchat a good or bad thing?” bad thing. $13 bad. Or $18 bad for GCE.

#4

Thanks, Brian!
This now makes it easier to see your motivation for analysis!


#5

It’s an interesting idea to try putting an order of magnitude cost on the system.

The total “netspeed” for the US zone is 86798173 (“kbps”), so one 1 gbps system should get a little over 1% of the US SNTP traffic (and the same amount of new NTP traffic).

If the cost for all that traffic was similar to what you saw, that’s about $64,000 per year just for the US.


#6

You’re paying way too much for network transfer. A Linode for $120 per year would handle around 10kpps.


#7

Have you checked out Amazon Lightsail? Amazon basically decided to copy DigitalOcean, only not quite as good, to capture the “OMG EC2 is so complicated and expensive!” market.

The plans mimic DigitalOcean’s transfer allotments, but overages are still $0.09/GB.

It’s only in us-east-1 so far.

Edit: I hate to sound like a shill, but if you’re fond of EC2, and want cheaper transfer…


#8

With Lightsail you get “cheaper bandwidth” however the CPU is likely to be the limiting factor in how much traffic you can serve.

There are various comparisons out there to compare AWS Lightsail with other providers, eg: https://www.vpsbenchmarks.com/posts/amazon_lightsail_1gb_is_no_match_for_10_vps_from_linode_do_vultr


#9

@cunnie Nice article :slight_smile:

I disagree with your recommendation to use the Google time smearing servers as upstreams for your NTP pool servers.

There is no doubt that using EC2 to serve NTP pool traffic is one of expensive ways to contribute.

As you’ve pointed out a cheaper option is to use a VPS provider that offers full virtualisation to use in contributing bandwidth to the pool:

Linode
Digital Ocean
Vultr
BuyVM
Ramnode
LunaNode
Atlantic
Arpnetworks

We do need to be mindful of the TOR “problem” where contributors to that project are largely centralised among a few networks, and yes it would be far better to have lots of people contributing with smaller “net speeds” from home connections with static IPs.

In summary there are cheaper ways to contribute bandwidth to the pool than using EC2 :slight_smile:


#10

Would you be interested in making the total “net speed” visible to server owners in the management panel so we’d get a rough idea of each servers contribution in a specific zone?

It could be a largely static value that’s updated once a day per zone, doesn’t have to be live and dynamic:slight_smile:


#11

For Vultr i could say, it is working without problems. :slight_smile:

My two australian server are hosted at vultr.

LeaseWeb is a option. They provide also VPS with at least 4TB/month.


#12

I have two small cloud servers here in EU with ScaleWay. One is a VPS in Amsterdam and then a dedicated ARM based machine in Paris. Both machines seem to max out the CPU around 15Mbit of NTP traffic and run with a continuously load of around 5Mbit with 10Mbit spikes. They have no limit on traffic, but speed is cut at 200Mbit and cost 2.99€ a piece.

They seem to recieve around 1,4TB of traffic in a month.


#13

We shall actually encourage people in scientific institutions to set up the time servers (as I did on the Rudjer Boshkovich Institute in Croatia - presently keeping 4 servers, the fifth went down with disk failure recently, will be substituted soon).
It is worth mentioning that much of the scientific community uses Linuces (and some other Unices), most of them set up to synchronise with the ntp pool. On my servers I noticed e.g. some EGI servers, clusters, different accademic and national mail and web servers, and a lot of individual academic computers.
Having a time server can be interesting from the scientific point of view, and most scientific institutions have open internet. The machines work all the time anyway always, most of them idling most of the time (even in reasonably big clusters, because scientific loads are exploration timed). All such institutions have a reasonable amount of fixed IPv4 addresses, including reserves!
There should be no administrative barrier to provide public service from the academic networks, as they are financed just to do that anyway!
A public time-server installation can also be for promotional purposes of the institution.


Active server numbers looks pretty static
#14

Thanks Ask, that’s useful piece of information — I’ll incorporate it in the post.

And I second josephb’s motion to make “net speed” accessible, if it’s not too much work.


#15

Thanks Joseph, I have updated the blog post to recommend NTP pool servers to not use Google’s servers (should publish shortly).

I’ve used Arpnetworks and Linode in the past, and have had good experience with both.


#16

@mnordhoff: Thanks, Amazon Lightsail would be particularly interesting because it uses the same API as AWS which means that I can use BOSH (the tool I use to deploy my servers).

@josephb: Thanks for the link for the comparison of Amazon Lightsail with others; that will help inform how big I size my VM.

@jan-philipp.benecke : Vultru sounds interesting — I might bring up an Australian server there.

@Hedberg: Scaleway no limit on traffic? That’s awesome. I may bring up a server there.

@zorko: I agree, the Universities could help a lot here. In fact, my upstream time providers (stratum 2) were university servers.


#17

If you do so, i can recommend you the national measurement institute:

http://www.measurement.gov.au/Services/Pages/TimeandFrequencyDisseminationService.aspx

There you need to whitelist your ip and then you could use their servers as “upstream time server” (All stratum 1).

For scaleway i can confirm that they have a unlimited bandwidth (https://www.scaleway.com/)


#18

Normally, what does a co-lo charge you for one more port more bonded?


#19

iocc, I only use IaaSes (i.e. VMs), not co-los (i.e. bare iron), so I’m afraid I’m not much help here.


#20

FWIW I ran and experiment spinning up 2 vm’s each on Digital Ocean and Linode (each on in a different data center). Given the same set of server (including my stratum 1) the Linux vm’s converged and stabilized very quickly. The Digital Ocean ones not so much, They wandered +/- 25ms for 2 days with high system jitter until I killed them.

I just added the Linode servers to the pool (IP4 and 6)