The author abandoned the distributed lock service he/she is writing, and it might benefit to read the original Google paper on their distributed lock service Chubby to understand why it is so enduring at Google: https://static.googleusercontent.com/media/research.google.c...
> I abandoned the project after it very quickly become apparent that despite having written the service in this super fast, brand new language called golang, the service just wasn’t fast enough to handle the scale we threw at it.
This makes me think the author wishes to use the distributed lock service for some purpose that's not well served by distributed locks. It's not that distributed locks are bad, it's just that the author seems to have a particular use case already in mind that's poorly suited to a distributed lock service.
> author wishes to use the distributed lock service for some purpose that's not well served by distributed locks
Exactly. It should be clear that a distributed lock service has a finite, and low, overall rate of progress. It obviously cannot be in the critical path of every transaction globally. But when using it for events which rarely happen, such as electing a new master of a partition of a database, or some other thing that happens once a week, then the low throughput is not an issue.
You can get into that type of lock congestion trouble in any language. It's an algorithm problem, not a language problem.
I discovered last year that Wine has terrible internal lock problems inside its user-side storage allocator. That's in C. If you have enough threads calling "realloc", the allocator goes into futex congestion collapse and performance drops by two orders of magnitude. My graphics program went from 60 FPS to 0.5 FPS. They optimized too hard for the no-congestion case.
This is a Wine-only problem; Microsoft's own code doesn't have this problem.
I've had lock congestion problems in Rust. Sometimes you need a fair mutex, or something gets frozen out. Both fair and non-fair mutexes are available; see the "parking_lot" crate.
There's a place inside WGPU that has a lock congestion problem in one of three locks, and I'm going to have to add more profiling to someone else's code to find that. I can see the problem with Tracy, but need to add more profiling scopes to narrow it down.
But that is high-performance graphics stuff, where microseconds count. Sending spam (OK, bulk marketing emails) doesn't need to be that tightly coupled. Mailing list removal runs on a timescale of days, not milliseconds. What else in that space has to be tightly interlocked?
A good language just gives you the necessary tooling to do whatever you want, it doesn't magically fix problems.
Only languages like C++ have a memory model that allows you to do lock-free programming for example (C and Rust copied the C++ model).
Also, what kind of serious person allocates memory from the system allocator in a real-time loop? Your problems seem self-inflicted. Regardless there are many allocators that optimize for concurrent allocations: tcmalloc, jemalloc, mimalloc...
Considering the vast number of programs that wine works extremely well with I'm not so sure they spent too much optimizing the no-congestion case. You are just doing something extremely quirky in your program.
I've looked at the code in a debugger. Wine has futexes three deep in "malloc". The innermost one is a pure spinlock. The problem with "realloc" is that, when it can't grow an array in place, it has to copy the contents. The Wine implementation does that with the main lock on allocation still held. So, if you have Rust code with a lot of multithreaded vector "push" operations, and more threads than CPUs, you get futex congestion. It's possible to write applications that don't hit this bug, but it's Wine-only, not Windows, so not worth it.
What's "quirky" is trying to use all the CPUs with lower priority threads.
Does the language make a huge difference here? In a distributed system a signal to be sent over the network travels at the same speed whether it was transmitted by a C or Python program, right?
actually its really _hard_ in go to make cpu bound control flow, state, and allocation. do goroutines have any notion of locality? i've been looking and haven't been able to find anything
> I abandoned the project after it very quickly become apparent that despite having written the service in this super fast, brand new language called golang, the service just wasn’t fast enough to handle the scale we threw at it.
This makes me think the author wishes to use the distributed lock service for some purpose that's not well served by distributed locks. It's not that distributed locks are bad, it's just that the author seems to have a particular use case already in mind that's poorly suited to a distributed lock service.