I bought a Dell Precision 7910 2x Xeon E5-2687W v3 (10 cores, 20 threads each) with 32GB RAM and 512GB SSD for $425 including shipping. I found that Windows 11 Pro will recognize only 20 of the virtual cores/threads. I don't feel a need to upgrade to more expensive Microsoft OSs at this time, so I just run Ubuntu natively on that box, which recognizes all of it. Assuming used DDR4 RAM returns to more reasonable prices at some point, I intend to load that box up to the 768GB max.
I'm always happy to see more innovation in this area. It'd be great if you could make your model, weights, and training corpus public (preferably under a permissive license) on GitHub. It'd also be great if you could run some benchmarks against the other similar tools in this area (I'm thinking particularly of Mathpix, Equatio, and Microsoft's math OCR in OneNote, Word, and Azure APIs. If you make your test corpus and code available I could set up the benchmarks for you.
I agree that it would be nice if the model was open weights and could run locally.
I have digitized almost all of my college handwritten notes, I would love to transcribe them, check them for errors, and contribute that as training data, but only for open weights models.