FWIW I got it working with rust and opencl, most of it is written by chatgpt as I have no clue about opencl.
GPU usage is only 50-60% and I get 100MH/s.
With hashcat and opencl I could get 12GH/s but I couldn't find a way to use hashcat for this use case.
Wow 24MH/s on an i7 with 8 cores sounds really good!
I don't know how I got it working, but I'm now at 3GH/s with my OpenCL implementation. I basically converted 90% of my rust logic to opencl and now my GPU is at 100% usage and I also needed to switch to a tty, as my window manager became unresponsive haha
I'm kind of glad about this HN post, as I had absolutely no clue about how sha256 and opencl worked before this challenge.
I'm glad you had some fun! This experiment went about as well as I could hope!
If anyone's curious, I'm getting 4.5MH/s single-threaded and 12.2MH/s multi-threaded on a slightly old i7 with 4 cores.
It's my own C++ implementation, which I've made about 20% faster than the fastest one I found online (Zig/stdlib, also tried Go/stdlib, C++/cgminer, rust/ring, C++/VanitySearch and Python/stdlib).
I think it might be faster just because I was able to skip some steps, since all inputs are short and of the same length.
I've just finished testing 10^12 inputs. I think I'll stop with 10 zeroes, which is very likely to happen in the next couple of days, according to my calculations. I might revisit it later to learn some GPU programming.