Yes it's all wrong, because: a) recall is designed to measure binary relevance, ...

Yes it's all wrong, because: a) recall is designed to measure binary relevance, but vector scores are not good relevance judgments and they aren't binary. b) most models optimise purely for distance, which makes nDCG look great, but causes content to clump together. This loses local ranking precision and the noise from embedding order is significantly greater than the approximation in the ANN system c) bi-encoders have significantly greater error than cross-encoders. Basically every vector DB is blowing at least one order of magnitude more resources than they need to to optimise bi-encoding efficiency which is wrong anyway.

Disclaimer: I work at Algolia.