Collapse of Russia country zone

Have you considered they are blocking the monitors of the pool?

And as such the number of good servers drop, but the rest ends up with all the requests?

If there is a firewall in place, maybe you should point the monitoring-system out to them and tell them it’s only there to keep GOOD NTP-servers online.

But if they block us, they are hurting only their own NTP-systems by doing so.

I wonder: Do they block monitors? Do they block NTP traffic?

I do not know the answer…

Same issue. I can only talk about my monitor, it’s unlimited, unregulated it will check any NTP-server it’s asked to do.

The question should be, is China blocking our monitors?
If so, can we ask them stop blocking us and explain what the monitors do.

Sorry, but I doubt the problem is the monitors.

When I check by hand:

bas@workstation:~$ ntpdate 139.199.215.251
ntpdig: no eligible servers

It’s not my side blocking access. Same for Russia in my opinion.

Sorry I misunderstood. That quote also contained a part of a message of mine, so I inferred that was also subject of your comment.

Not sure many more are needed. Some more diversity would be welcome, e.g., also in the zone with the single most number of estimated Internet users, or in that region in general.

E.g., some fellow timekeepers in China had enlisted a university’s resources to set up a monitor, but their requests for help to set it up, both here in this forum, but I understand also via more direct channels with pool responsibles, apparently never got anywhere, so I guess they gave up eventually.

If that change you propose would make it easier to set up monitors without going through a single bottleneck, that would indeed be helpful and welcome. My VPSs in Germany are underutilized as servers, but instead of decommissioning them when the first fixed-term subscription is up, at least one of them might make a well-connected monitor (though that wouldn’t really boost the diversity aspect I highlight above, just adding another city in Germany as monitor location).

There’s obviously always things to optimize, especially if that means that it subsequently frees resources to do other important stuff. So I hope this will precipitate work on removing the lock-in of clients into their (assumed) respective country zones in order to spread the load wider especially in under-served zones like the ones this thread is about, with the titular consequences for the zone.

Sure, with that backdrop, the China zone will likely never have the vibrant assortment of living-room hosted servers contributing to the pool as there is in Europe or the USA. But the parallel thread shows that people are still willing to set up servers in data-centers with capacity similar to living-room hosted servers elsewhere, but structural issues with the pool prevent that. Because as it is right now, the entry barrier for such small servers is just too high, much higher than, e.g., in Europe or the USA. And as you say, there’s only so much incentive for big players to add on top of the resources they also contribute to make that chicken-and-egg problem go away.

I thought there previously had been somewhat broad consensus in this forum that the tight lock-in of clients to servers only from their own country zone (as best as that can be determined by the pool) is “bad”. E.g., a while back, when that was discussed I think in the context of the paper by @giovane, @marco.davids, et al., or around that time, Ask had mentioned that he is preparing something like that. Because beyond the threats discussed in that paper, this lock-in is causing multiple different but real-life issues, documented time and time again on this forum, but then subsiding for a while at least because I guess people give up, not getting anywhere. That is not specific to the China zone, or the Russia zone, or any other. But it would be something that the pool can do to improve the situation, also in China and Russia, and I understood endeavored to do so at least in general. It now just needs to happen some time…

And there’s been no evidence that I am aware of that either of those conjectures would be true, but some evidence that both of them most likely are not.

As I wrote before, why is it so hard to grasp that when an estimated 132,000,000 Internet users want to get time from less than 10 active servers, that is not going to work for smaller servers like of those operators who contribute to this thread?

I understand to some extent how difficult it might be if not experienced oneself, as I hardly get 2 Mbit/s traffic at a 3 Gbit netspeed setting on my servers in Germany myself. I guess the situation might be similar in the USA.

But when I set up my first server in Singapore, I realized what it really means to have a server in an under-served zone, because I am getting peaks above 2Mbit/s already at the 512 kbit netspeed setting (and as that is the lowest setting currently available, I had to remove that server from the pool because of that the other day). In South Korea, it is similar. So I can only imagine what the situation in China, or now in Russia must be.

I never said it was. In fact, I chose this example in that context precisely because of that, because like with the other examples, I had the impression from related contributions, and based on who contributed, that it might get some traction. And despite it being so challenging, if its implementation were to help with the issues discussed, e.g., in this thread, I’d be happy to see it implemented. But as I see similar challenges to what you mention, my fear that if it were implemented, it could further delay implementation of some fixes, or at least some mitigations to the main problem: clients being locked to servers of their own zone only.

But obviously, what is being worked on, or isn’t, is rather opaque to me at least, the last few features that came out were nothing I had on the radar before they came out. If you happen to have better insights, I’d be happy to hear, and many others as well I guess. I guess part of the frustration with all these issues people are having is because there is pretty much no communication on what is being worked on, and what the plans are, to at least get a perspective as to when things might improve.

This question has been answered many times:

NO

Welcome to the club! May I respectfully refer you to @avij’s previous post, and invite you to reread, and take to heart. It is still applicable in this current thread.

1 Like

Sure? I can not get time from the IP given.
How can you be sure they don’t block us? I do not know, my command gives no answer. The monitor will answer the same.
If NO, how come I can’t reach it by hand?

CGNAT is a problem, I have it too.
To counter that you can either block those IP’s, ratelimit or simply contact those abusive CGNAT providers and tell them to intercept 123 and run their own servers.

To be blunt, I hate lazy CGNAT ISP’s too, and trust me, my servers are being attacked by them too.

I do put ratelimits on those idiots. Chrony has a good line for it: ratelimit :rofl:

May I respectfully invite you to re-read, and take to heart, what @avij wrote previously:

I have read it…but do you understand his problem?

He’s having a major load on his server, I get it.

All the rest too.

So why can’t I get time from his server when I test? As that is what the monitor does.

How come the number of ntp-servers in Russia dropped?

Do they drop monitor requests? You say NO, I’m not so sure.

Because it may be offline/having strict rate limiting in place in an attempt to somehow deal with the issue?

If you look at the offset/score graph, you’ll see that it was active during earlier periods, and apparently is not reachable anymore just somewhat recently.

Same with other servers, once removed from the pool, the scores recover.

So unlikely there is some broad blocking of monitors. Similar in China, by the way.

That’s the proverbial million dollar question.

When looking at the data that is shared as part of this thread, you can see that it kind of started when a big chunk of capacity was removed from the zone, for whatever reason, but the number of servers did not drop in proportion to that.

The hypothesis is that the traffic that that supposedly “big” server was getting originally was now redistributed among all the other servers still in the pool.

As the zone was possibly on edge already as it was, that additional load overloaded some other servers, so they dropped, first just in score, then more and more actually removed from the pool, perhaps people got fed up by too much traffic, as participants in this thread mention they are considering, or know of others who did.

So as servers dropped like flies from the pool, the situation now is that some 10 servers serve an estimated 132,000,000 users (exact numbers not important, just the relation, e.g., to other zones).

DDoS attacks were considered as well, but given that the traffic load seems responsive to a server being removed from the pool, either by score dropping below 10, or being put in “monitoring only” mode, that seems unlikely (though, as always, not impossible).

2 Likes

I do know Putin said there would be a Russian Internet/Firewall.

No, I’m not going into politics, but it might explain they drop our monitors.

Same for China.

I also do not know if all servers dropped, they could still be working.

Has anyone asked Putin why it happened? Anyone the email of him?

Please, there are a lot of conjectures and rumors, one could almost say conspiracy theories.

But so far, there is no data to support that.

On the other hand, there is data/evidence to the effect that there is no filtering or anything of that sort to the extent it would cause the problems discussed here and the parallel thread.

If you have actual data, or other evidence that can be looked at, please share. But peddling hear-say and conjectures does not help.

1 Like

Well I did my best…I wrote Putin:


Addressee
to the President of Russia’s electronic reception office
Full name (surname, name, patronymic)
.........
Email address
.........
Phone
+32xxxxxxx
Text
Dear Mr. Putin,

I'm one of the members of the pool.ntp.org organisation, and we noticed a drop in NTP (timeservers) in Russia. These servers get time via port 123UPD. At the same time, monitors all over the world check if timeservers are working. Same port 123UDP. On most requests we get no answer. What happens then is that time-servers are taken offline, and the few servers that do work get all load of Russia on then. Upto 1TB/second. No server can handle this. I presume your firewall is blocking our monitors and as such collapsing the Russian Time-server system.
Regardless of the war, time is not a political thing. NTP-pool.org does not care about anything but time.

I hope you hereby open 123UDP on your firewall to allow monitors for timeservers to function.
All timeservers have opensource coding, including the monitors. 

Please allow this to work, as it hurts Russian people with their computers.
There is no thread in this.

A simple GPS device of 5 euro can do the same, picking time out of the air.

I hope you open the firewall to allow NTP packets cross both ways.

Best ragard,

Bas xxxxxxxxx - Belgium.

Not shot is always missed…ask him…?

I don’t think so.China’s firewall operates on a blacklist mechanism.It only blocks websites that are on the blacklist(such as Google,YouTube,etc.)or shuts down high-risk traffic(such as vless,a VPN protocol,etc.).It’s clear that the NTP protocol is not within the scope of what is blocked.I believe there might be other reasons for the monitor’s inability to connect.As for the undersea cables between China and the US,I found that there are two,one with a bandwidth of 80Gbps and the other with 5.12Tbps(the information I found might be outdated,if anyone has more recent news,please let me know).However,China has too many users,and during peak afternoon hours,there might be data transmission far exceeding the capacity of the cables,and UDP from the monitor might be discarded(note that this is just my speculation,but I don’t think it has a direct relationship with network blocking).

1 Like

How come, then, that when the server is not in the pool, its score nicely recovers, and stays well above 10?

The graph is not perfect, and I am not saying there aren’t issues with connectivity at all, and things couldn’t be better. But if the speculation were true, then the score would need to actually go up and down noticeably throughout the day even for a server that is in “monitor only” mode.

When, however, local traffic is added, the problems start, in your case, because it’s easily too much of it.

But also, e.g., in case of Tencent. From probes within China, there is about 10% packet loss. I am not talking about the monitors, but actual clients inside China. How do you figure that international connections being overloaded cause traffic inside China to be affected that way? Or, the other way round, how should monitors within China help if there is 10% packet loss for traffic within China?

And I am not saying connectivity within China is bad, pings to one of the Tencent servers show pretty much no packet loss.

So my personal conclusion is that even the Tencent servers are overloaded, until further evidence rather than speculation is presented that hints in other directions. Or a plausible line of reasoning based on existing evidence but leading to a different conclusion is presented. Or flaws in the above line of reasoning, or interpretation of current evidence (which is far from being definitive proof).

And repeating such speculation also doesn’t help to get the problem solved, because it blocks the view to what I’ve seen it boil down to here and elsewhere:

Adding more capacity is needed, and as that presents a chicken-and-egg problem, and faces structural obstacles in some places, spreading the current load more widely is needed, because this lock-in of clients to servers of their own zone is not only causing issues here, but similar, and other issues elsewhere as well.

As others have pointed out as well, blaming the monitors against all current evidence is not helpful. I am not saying they are perfect, and having more diversity would be welcome. But they are not the decisive issue neither in China nor in Russia (based on current evidence).

And the gradual relaxation/removal of the lock-in has already been pretty much agreed upon in this forum because it is “bad” for many reasons. Now, it just needs to be implemented - contingent, unfortunately, on the resolution of some non-trivial challenges in the process.

1 Like

On Saturday I saw over 1600 packets/second from 213.183.x.y.
I think the rate could be higher if the bandwidth were not exhausted by other ntp-clients.
I submitted an abuse report to abuser’s ISP but still have not received any replies.

Thanks for sharing! Very interesting. And while NTP itself then seems not in scope, it is likely that just inspecting a large number of packets for some perceived “risk” indirectly may affect latency-sensitive NTP traffic as well to contribute to the wide spread and variability in offsets seen in the graph. Though even then, some monitors seem to see a rather steady offset despite that, perhaps because of shorter network paths for them.

What if the system auto-blacklists the monitors? Then it disables servers after the firewall from being read and listed in the pool.
As we then score all NTP-servers as unreachable, ergo 0 points.

The monitors do ‘hit’ their NTP severs and always the same source and target, they could see that as an ‘attack’.

I mean, I use fail2ban myself to counter attacks on my servers, is very effective, but it depends on logging and counters, when you hit the counter too often the door is closed for a period of time.
If they use that mechanism in their firewall, it will explain the ‘packet loss’.

I fail to see how congestion would show such major drops, it will slow down things, sure. But won’t block it. Of course UDP has no resend options…but it should not block every attempt.

My 2 cents.