The One Billion Row Challenge – .NET Edition

npalli · on Jan 7, 2024

Looks like the C version is probably more than twice as fast compared to the .NET and JDK versions (which are now close)[1]. Someone needs to do the full modern hardware treatment (for ex. SIMD) to the C version (perhaps even adding some ASM) to see where we land.

[1] https://github.com/dannyvankooten/1brc

tuwtuwtuwtuw · on Jan 7, 2024

That "char city[32]" looks like a challenge. Lets hope Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch is not in the test set.

npalli · on Jan 7, 2024

Is it really a C program if you cannot execute a buffer overflow attack.

RodgerTheGreat · on Jan 7, 2024

The repo for the original Java-based challenge states that station names will be between 1 and 100 bytes long. In practice, the test data uses shorter names, and the first few bytes of each name happen to be distinct from other names, so you can cut some corners and still arrive at the correct result.

tuwtuwtuwtuw · on Jan 10, 2024

If the application doesn't have to be implemented according to spec, then I propose we skip reading the file and just hard code the results.

buybackoff · on Jan 7, 2024

I'm the author of the fastest .NET version. Ask me anything about the implementation. The code is very short and simple.

belinder · on Jan 7, 2024

You wrote this on your repo

> Extending short => int is more expensive than short => ushort => uint => int

Can you explain why

ygra · on Jan 7, 2024

One results in MOVSX, the other in MOVZX [1]. The difference thus is sign/zero extension when moving to the larger register. However, they seem to perform pretty much identical if I'm reading Agner Fog's instruction tables correctly.

[1] https://sharplab.io/#v2:C4LghgzgtgPgAgJgIwFgBQcDMACR2DC2A3ut...

EDIT: Ah, the other reply notes that this is likely only visible in a method that also does calculations and thus keeps those values in registers.

anonymoushn · on Jan 7, 2024

if the value starts out in a register, short => int is a sign-extend instruction. short -> ushort -> int is 0 instructions.

buybackoff · on Jan 7, 2024

The sibling comments said it sooner and better

neonsunset · on Jan 7, 2024

There is also now a direct comparison with the Java results: https://github.com/buybackoff/1brc#results

jackfoxy · on Jan 7, 2024

I've left the Microsoft world. That said the dotnet environment is pretty frickin' cool. Not least of all for the mostly excellent documentation across a lot of libraries. And MS keeps throwing money at development. It's just not better enough to attract Linux devs brought up entirely in the Linux world. And I suspect most dotnet devs who jump entirely into Linux abandon dotnet because their new shop doesn't use it.

plusmax1 · on Jan 7, 2024

We develop and deploy dotnet on Linux at my shop, it works great. I can recommend it to everyone, the dev experience is wonderful. With Rider as a development environment my team couldn't be happier really. I did a lot of development in C, java and python, but C# feels a lot more solid and developing with it truly is a joy.

minimeme · on Jan 7, 2024

Can confirm, dotnet works great on Linux. We have a lot of dotnet production systems running on Linux and Docker for many years now with zero issues. Also recommend Rider in Linux for development, even our Windows folks prefer it over VS nowadays.

foverzar · on Jan 7, 2024

No issues with libraries that require old Windows ASP.NET?

This was a frequent issue each time I tried using something from .NET ecosystem on Linux.

minimeme · on Jan 8, 2024

This was maybe a problem on the early days of .NET Core, but from .NET 6 or so they have reimplemented (almost) all the stuff from the old .NET Framework and that in turn enabled relatively easy porting of third party libraries, so most of them are ported as of today (atleast more or less maintained ones).

kikimora · on Jan 7, 2024

Never encountered this in the last 5 years working on .NET + Linux system.

SeanKilleen · on Jan 8, 2024

We're developing in .NET and publishing Linux containers deployed to Kubernetes. It's been an absolute joy!

eyegor · on Jan 7, 2024

It's wild to me that they would write performance focused code (unsafe ptrs, memory mapped files, etc) and then still use linq in the main work loop [0]. This implementation could go much much faster. Of course rewriting the parallelism and aggregation from linq is a decent amount of effort, but I'm betting that applying the lazy man's hack [1] would have decent gains.

[0] https://github.com/buybackoff/1brc/blob/main/1brc/App.cs#L12...

[1] https://github.com/reegeek/StructLinq

brabel · on Jan 7, 2024

You're assuming they didn't try that. I wrote a blog post about a benchmark once, and posted on Reddit/HN, and everyone was like "clearly he should've tried X and Y" which I had, but didn't mention because I had mentioned many other things already which I thought were more interesting to mention... So, my advice is: try it before you accuse someone of not having done it properly. I can almost guarantee you'll find it's not better. If you find it's better, then at least you don't make a fool of yourself by suggesting inocuous changes. Performance is very surprising sometimes.

buybackoff · on Jan 7, 2024

Author there. Nothing really matters other than `ProcessChunk`. Don't you think I kind of know what I'm doing with these numbers!?

anonymoushn · on Jan 7, 2024

I don't think much time is spent in this code.

miga · on Jan 8, 2024

Anybody would host a language-agnostic comparison?