Copying in a response to a similar question on Reddit:
I should really add some discussion around BLAS in particular, which has an good implementation[0] of the float32 dot product that outperforms any of the float32 implementations in the blog post. I'm getting ~1.9m vecs/s on my benchmarking rig.
However, that BLAS became unusable for us as soon as we switched to quantized vectors because there is no int8 implementation of the dot product in BLAS (though I'd love to be proven wrong)
I should really add some discussion around BLAS in particular, which has an good implementation[0] of the float32 dot product that outperforms any of the float32 implementations in the blog post. I'm getting ~1.9m vecs/s on my benchmarking rig.
However, that BLAS became unusable for us as soon as we switched to quantized vectors because there is no int8 implementation of the dot product in BLAS (though I'd love to be proven wrong)
[0]: https://pkg.go.dev/gonum.org/v1/gonum@v0.14.0/blas/blas32#Do...