Servers in India zone are unstable

rishabhlakhotia · May 6, 2022, 5:31am

Hi guys

I’m running 3 NTP servers, two of which were added two weeks ago in the India zone. The third one was added two days ago. Here are a few things that I noticed:

The two servers (let’s say A and B) I added earlier show a sawtooth pattern in the monitoring graph.

image1096×530 52.3 KB
The third server (C), despite being in India (but autoconfigured to US zone) is very stable and its score is only increasing.

image1056×540 32.9 KB

Server A, and C are on Oracle Cloud. I’m running my projects on them, and on the side, I’m running these NTP servers. The projects consume very little resources so resource crunch is not an issue.

Server B is a Raspberry Pi 4B in my home. It has a static IPv6 and local monitoring shows 100% of the requests were responded to, however, it also shows a sawtooth pattern in the monitoring.

I’m not able to understand the cause of this. I switched from NTPd to Chrony but still there is no difference.

Is this an issue from the monitoring end that zones in India/Asia are not being monitored correctly, because two same servers, in the same location, with the same configuration, but different zones are showing different monitoring results?
Or are the servers actually getting unstable in the India zone due to any reason whatsoever?

Also, two different servers in two different locations, and different networks, but in the same zone show the same sawtooth graph.

Any help or right direction for debugging is appreciated

Bas · May 7, 2022, 6:04pm

It happens to many IPv4 servers.

@Ask is working on a new monitor-system to minimise this from happening.

Nothing you can do or change to make it better.

It’s just UPD-timeouts that happen underway and as the routing of IPv6 is faster is happens far less.

Therefor a new system is in the making to have more then 1 monitor and a better score-handling system.

Bas.

stevesommars · May 10, 2022, 12:47pm

I’m working with Rishabh to collect traceroute, etc. This situation differs from the many others we’ve seen, as it affects IPv6.

rishabhlakhotia · June 9, 2022, 5:35am

Hi guys

I have more new findings.

I asked the team to move the zone of server C from above to India, Asia. Remember, server C was working flawlessly, perfect 20 scores, without any dip whatsoever.

As soon as the zone was changed to India, I saw a slight dip within a few hours, followed by the result shown below.

I got an email saying that this server is removed from the pool, which makes sense.

I hope this new finding will help in narrowing down the problem.

fxxkputin · June 9, 2022, 7:09am

this maybe caused by unstable connection from oracle india to ntp pool san jose(i.e you know that some routes is more congested while some are not), so even you have stable connection towards all other destinations, if your server connection with ntp pool san jose is unstable, the score will definitely be sawtooth pattern

p.s the monitoring timeout is only 500ms rather than usual 3000ms or 5000ms, so when you are testing network stability by sending ping packets from your ntp server to San Jose Monitoring station, remember to set the timeout to 500ms

fxxkputin · June 9, 2022, 7:13am

this is my new ntp server and have been run for more than 1 day and score only increases, so it shows that the monitoring station and indian zone have no problem

lordgurke · June 16, 2022, 4:41pm

I have 3 servers running at Linode in Mumbai, which are also prone to some flakeyness. This affects both IPv4 and IPv6 - i.e. pool.ntp.org: Statistics for 2400:8904::f03c:92ff:fe80:f6ac
Also routing from there to my home internet connection (Deutsche Telekom, AS3320) is about 5% lossy.
If you need traceroutes or other tests, I can happily help out with that.

rishabhlakhotia · June 16, 2022, 5:10pm

@fxxkputin I was running another NTP server from my home, which too had the same problem. Not only Oracle, but also ISP Jio and Airtel have the same problems. Btw, can you share the address of the San Jose monitoring station? I’ll share the ping results from all the systems I ever ran NTP server.

Thanks @lordgurke, but I think I’ll wait for a few months until my current VPSs subscription expires and we move to a dedicated host. During that time, I’ll also investigate the network issues I may have missed earlier.

Max · August 9, 2022, 8:04pm

Okay, so for the people on Oracle cloud who are having this issue, this is caused due to the conntrack table on the VNIC filling up and dropping the packets, you can identify this by looking at the metrics here: Compute > Instances > Instance Details > Attached VNICs > VNIC Details

To overcome this you have to enable stateless rules in your security list, I essentially have this bypassed in my case as I use a firewall on the host itself.
Here’s what my setup looks like:
Ingress Rules

Egress rules must match when stateless firewall is used:

This took longer than I’d like to admit to figure out but this solved my issue entirely.
Also don’t forget to increase the conntrack table size on the host itself, I’ve seen mine go as high as 1,000,000 entries on my busiest server.

Topic		Replies	Views
IP 192.46.208.0/21 "misplaced" to USA, routed to India Server operators	5	774	July 12, 2021
Problems with the Los Angeles IPv4 monitoring Station Server operators	25	4128	May 10, 2019
Sawtooth graph - every time score is over 10 the next check times out Server operators	18	924	October 23, 2020
Spotty IPv6 in India Server operators	2	383	August 23, 2023
Lagging IPv6 server status history graph Server operators	13	618	March 21, 2023

Servers in India zone are unstable

Related topics