Stratum 1 vs. Stratum 2 and lower - traffic loads / GeoDNS?

So I noticed that I get twice if not three times as much traffic as a Stratum 2 server and below and the traffic just keeps growing, which is to be expected. But, once I go back to my Stratum 1 server the traffic drastically drops.

Any thoughts and ideas?

@daniel.quick the only thing the system does with the stratum value is to record it and mark the server ā€œbadā€ if the stratum is an unusual value (0 or higher than 6 if I remember right).

I canā€™t think of or have heard of why a client would behave differently in the two scenarios.

Did you see the pattern consistently? Maybe it just coincided with when your server (randomly) rotated through a popular DNS resolver? How did it match up with when you changed the ā€œnet speedā€ of the server?

It request level coincided when the monitoring station discovered the server went from Stratum 1 to Stratum 2. The Stratum 2 server is a Windows 10 server running Domain Time II syncā€™d to a local Symmetricom SyncServer S350 PTP Grandmaster. According to the live running stats the time is syncā€™d to approximately +/- 0.000(+/-0-200). When the Stratum 1 is answering queries for NTP, Daytime, and Time itā€™s getting ~12 queries a second. Number jumps quite a bit when itā€™s turn for a heavier load comes around, but when the Windows 10 PC (Strat 2) starts answering it slowly builds to over hundreds if I dare say thousands of queries a second after a day running. When I switch the port forward back to the Stratum 1 it continues to handle the load, but once the monitoring station detects itā€™s gone back to Strat 1 it almost instantaneously goes back to a drastically lower number of queries.

@daniel.quick, welcome to the community!

Interesting findings. May we have your IP(s) to be able to look at the data collected by the monitoring?

Sure. 68.97.68.79. Right now itā€™s running the Strat 2 Windows 10 server. Iā€™m also participating in the NTP Beta project so thereā€™s also the other two monitoring stations reporting.

Donā€™t know if this is relevant. When server is stratum 2 it usually advertises root dispersion=0.
Running of stratum 2 it advertises root dispersion in 0.21 - 0.55.

1 Like

I suspect the problem is with the poll value. What NTP implementation is running on the stratum 2 server and how it is configured? If the server responded with a small poll, some clients like ntpd would take that as a suggestion to increase their polling rate. The LeoNTP units had (or maybe still have) this problem too.

2 Likes

Root Dispersion is the total amount of uncertainty from the server to the S0 source. Since a S1 server is directly connected to a S0 source, it is usually 0. But each stratum below would figure out the delay and it gets compounded to the root delay, then ntp uses this value when deciding which servers are ā€˜the bestā€™ to use for itā€™s time selection.

Iā€™m using a program called Domain Time II from a company called Greyware. Further digging revealed this page, which may answer more of your questions. The answer to client queries was not immediately apparent to me on this single page.

Domain Time II Server - ntpd compatability

Iā€™m using this program, because itā€™s among the few that allow a Windows 10 machine to have itā€™s time synchronized via PTP.

Iā€™ve shifted my server back to Stratum 2 so we can look at the responses the server is answering with.

Correction: ā€œWhen server is stratum 2 it usually advertises root dispersion=0.
Running of stratum 1 it advertises root dispersion in 0.21 - 0.55 msec.ā€
Sorry for typo. The point was that stratum 2 root dispersion is lower than stratum 1. Thatā€™s not what I would normally expect.

Miroslavā€™s suggestion makes sense.

Could it be because the machineā€™s time is being synchronized via PTP which can in theory provide up to pico-second synchronization? my machines current offset is -0.0000018 Āµs. Since NTP measures in milliseconds Iā€™m not sure of the value that that would be interpreted/translated into and served to NTP server-client queries.

Is your kernel fast enough to get those low values?
I doubt it.

My 3 GHz Quad core Intel CPUā€¦

root@server:~# chronyc tracking
Reference ID : 47505300 (GPS)
Stratum : 1
Ref time (UTC) : Fri Feb 07 15:02:00 2020
System time : 0.000075142 seconds slow of NTP time
Last offset : -0.000116964 seconds
RMS offset : 0.000503468 seconds
Frequency : 32.714 ppm fast
Residual freq : -0.154 ppm
Skew : 14.348 ppm
Root delay : 0.000000 seconds
Root dispersion : 0.000531 seconds
Update interval : 16.0 seconds
Leap status : Normal

As you can see the root dispersion, itā€™s not even near pico-seconds values.
I doubt if you can even get there at all.

Hereā€™s the output from Domain Time IIā€™s raw data dump log. Formatting follows as well:

Check Time / Time Source / Set / PhAdj / Check Reason / Seconds / delta from Time Source (Report Source)

Fri Feb 07 2020 09:08:54 192.168.69.197 N -15 PTP - 0.0000076
Computerā€™s clock set to check/update every 15 seconds

It seems the poll value the server responds is hardcoded to 6. That might be ok in a local network, but doesnā€™t work well on Internet. All those ntpd clients having the default maxpoll of 10 would stay at 6, increasing their traffic by a factor of 16. If you see a 2-3x overall increase, it suggest that normally those clients would make about 10% of your traffic.

3 Likes

Thanks Iā€™ll shift back to the stratum 1 server when I get home tonight. Iā€™ll contact the company to see if this is a configurable parameter elsewhere in the file system.

@mlichvar is right. Here it is how does it look like with both of the two server:

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
-68.97.68.79     192.168.69.197   2 u   27   64  377  138.375    1.513   5.967
-68.97.68.79     .GPS.            1 u   46 1024  377  139.085    1.082   1.376

Look at the poll value. With the stratum two server it never went above 64 sec (2^6). With your stratum one server it went up to 1024 sec (2^10). That clearly explains the traffic quantity difference.

3 Likes

There is similar server out there with fixed poll value: http://support.ntp.org/bin/view/Servers/PublicTimeServer001438 . It is fixed for 4, (16sec). I put ā€œminpoll 10ā€ for the definition of that server into my /etc/ntp.conf.

IIRC time.cloudflare.com uses also fixed poll of 6 (donā€™t know if they already changed it)

They did change it. :slightly_smiling_face:

IIRC, they actually originally used a fixed poll of 0 ā€“ probably under a theory of ā€œletā€™s set unnecessary fields to 0ā€ ā€“ which led to ntpd clients using their minpoll setting.