Wow 24MH/s on an i7 with 8 cores sounds really good!
I don't know how I got it working, but I'm now at 3GH/s with my OpenCL implementation. I basically converted 90% of my rust logic to opencl and now my GPU is at 100% usage and I also needed to switch to a tty, as my window manager became unresponsive haha
I'm kind of glad about this HN post, as I had absolutely no clue about how sha256 and opencl worked before this challenge.
I'm glad you had some fun! This experiment went about as well as I could hope!
If anyone's curious, I'm getting 4.5MH/s single-threaded and 12.2MH/s multi-threaded on a slightly old i7 with 4 cores.
It's my own C++ implementation, which I've made about 20% faster than the fastest one I found online (Zig/stdlib, also tried Go/stdlib, C++/cgminer, rust/ring, C++/VanitySearch and Python/stdlib).
I think it might be faster just because I was able to skip some steps, since all inputs are short and of the same length.
I've just finished testing 10^12 inputs. I think I'll stop with 10 zeroes, which is very likely to happen in the next couple of days, according to my calculations. I might revisit it later to learn some GPU programming.
Got an optimised C++ version with no deps averaging about ~24 MH/s on an i7-11800H.
I've got 9 zeros; if I get a result that ranks top 10 I think I'll submit and call it a day.