I agree that BM25 retrieval + vector re-ranking can work. But vector search does bring results to the table that vanilla BM25 can't, even with a large retrieval window. So I do think there is a place for both with the usual "it depends on your data/requirements" caveat.
The opposite is also true. BM25 prefers lexical matches and brings these candidates back that vector search often doesn’t.
I am not disagreeing vectors are useful, but I think benchmark based evidence is not the same as deploying a solution that must scale, be constantly updated, serve many use cases like filtering, search syntax, etc customers want.
And plus I think there’s a real danger Of herding to vector retrieval (even then one view of it) which cuts off exploration of diverse solutions.
I'm with you 100% on not herding to any one way for any problem. I still remember the pre-2023 world where you weren't pressured to work LLMs into everything.
And I'm sure you're aware of the BEIR paper: https://arxiv.org/abs/2306.07471. Elastic references that in this blogpost: https://www.elastic.co/blog/improving-information-retrieval-...
I agree that BM25 retrieval + vector re-ranking can work. But vector search does bring results to the table that vanilla BM25 can't, even with a large retrieval window. So I do think there is a place for both with the usual "it depends on your data/requirements" caveat.