Looks like the C version is probably more than twice as fast compared to the .NET and JDK versions (which are now close)[1]. Someone needs to do the full modern hardware treatment (for ex. SIMD) to the C version (perhaps even adding some ASM) to see where we land.
The repo for the original Java-based challenge states that station names will be between 1 and 100 bytes long. In practice, the test data uses shorter names, and the first few bytes of each name happen to be distinct from other names, so you can cut some corners and still arrive at the correct result.
One results in MOVSX, the other in MOVZX [1]. The difference thus is sign/zero extension when moving to the larger register. However, they seem to perform pretty much identical if I'm reading Agner Fog's instruction tables correctly.
I've left the Microsoft world. That said the dotnet environment is pretty frickin' cool. Not least of all for the mostly excellent documentation across a lot of libraries. And MS keeps throwing money at development. It's just not better enough to attract Linux devs brought up entirely in the Linux world. And I suspect most dotnet devs who jump entirely into Linux abandon dotnet because their new shop doesn't use it.
We develop and deploy dotnet on Linux at my shop, it works great. I can recommend it to everyone, the dev experience is wonderful. With Rider as a development environment my team couldn't be happier really. I did a lot of development in C, java and python, but C# feels a lot more solid and developing with it truly is a joy.
Can confirm, dotnet works great on Linux. We have a lot of dotnet production systems running on Linux and Docker for many years now with zero issues. Also recommend Rider in Linux for development, even our Windows folks prefer it over VS nowadays.
This was maybe a problem on the early days of .NET Core, but from .NET 6 or so they have reimplemented (almost) all the stuff from the old .NET Framework and that in turn enabled relatively easy porting of third party libraries, so most of them are ported as of today (atleast more or less maintained ones).
It's wild to me that they would write performance focused code (unsafe ptrs, memory mapped files, etc) and then still use linq in the main work loop [0]. This implementation could go much much faster. Of course rewriting the parallelism and aggregation from linq is a decent amount of effort, but I'm betting that applying the lazy man's hack [1] would have decent gains.
You're assuming they didn't try that. I wrote a blog post about a benchmark once, and posted on Reddit/HN, and everyone was like "clearly he should've tried X and Y" which I had, but didn't mention because I had mentioned many other things already which I thought were more interesting to mention... So, my advice is: try it before you accuse someone of not having done it properly. I can almost guarantee you'll find it's not better. If you find it's better, then at least you don't make a fool of yourself by suggesting inocuous changes. Performance is very surprising sometimes.
[1] https://github.com/dannyvankooten/1brc