I did consider Avo! I even went as far as to implement a version using Avo since it has a nice dot product example I could use as a starting point. But ultimately, for as small as these functions are, I felt that Avo was an unnecessary extra layer to grok. Additionally, it's x86-only, and I knew in advance I'd want to implement an ARM version as well since we also do some embeddings stuff locally.
If I were to ever take this further and add loop unrolling or something, I'd absolutely reach for Avo
If I were to ever take this further and add loop unrolling or something, I'd absolutely reach for Avo