That's cool, am I right in assuming that you want to automate the production of ...

georgehotz · on Nov 4, 2022

Yea, benchmark based compilation, that's already happening in the tinygrad compiler we use for openpilot to determine the local group size. https://github.com/geohot/tinygrad/blob/caea34c52996cde2ed46...

Working on parameterizing a search space that includes more than the local group size. The end dream is some ML guided search to optimize the kernels :)

WithinReason · on Nov 4, 2022

OK generally I think you're doing exactly what I believe ML is lacking right now. Another huge opportunity is instead of taking the average neural network and designing accelerators for it, designing hardware-friendly networks that run well on a sane accelerator that was designed to work with only these specialised networks (that doesn't need 80% chip area for on-chip memory for example). These might end up being completely different networks to what researchers use today. I work in this area and I think it's also possible to use the loss function to optimise the network for a specific HW.