So nice! That's an excellent extract and looks useful for benchmarking Meilisearch. I'll probably spend my Christmas holidays importing the tracks, albums, and artists into Meilisearch, while my CEO builds a beautiful front-end for it. I'll probably replace [the current music search demo](https://music.meilisearch.com) we have with this much higher-quality dataset!
That would also be a good fit for [the new delta-encoded posting lists I am working on](https://github.com/meilisearch/meilisearch/pull/5985). Let's see how good it can get. My early benchmarks showed a 50% reduction in disk usage.
Someone reported it, and I answered today [1]. It's a rule that is too hard on the front end, and we will fix it by using a better Hybrid search setup (not only semantic). Thank you for the report.
Right. We released a lot of new versions of the engine to improve the indexing part of it. V1.12 is improving the document indexing a lot! Have you tried the latest version v1.14 we released yesterday?
While Meilisearch is capable of limiting it's resident (actual mallocs) memory. However, it requires a bare minimum (about 1GiB).
35 GiB is probably a third of the data I index into Meilisearch just for experimenting and don't forget about the inverted indexes. You wouldn't use any O(n) algorithm to search in your documents.
Also, every time you need to reboot the engine you would have to reindex everything from scratch. Not a good strategy, believe me.
The best you could do is put Meilisearch on a very good NVMe. I am indexing large streams of content (Bsky posts + likes), and I assure you that I tested Meilisearch on a not-so-good NVMe and a slow HDD — and ho, Boy!! The SSD is so much faster.
I am sending hundreds of thousands of messages and changes (of the likes count) into Meilisearch, and so far, so good. It's been a month, and everything is working fine. We also shipped the new batches/ stats showing a lot of internal information about indexing step timings [1] to help us prioritize.
Meilisearch just improved the indexing speed and simplified the update path. We released v1.12 and highly improved indexing speed [1]. We improved the upgrade path with the dumpless upgrade feature [2].
The main advantage of Meilisearch is that the content is written to disk. Rebooting an instance is instant, and that's quite useful when booting from a snapshot or upgrading to a smaller or larger machine. We think disk-first is a great approach as the user doesn't fear reindexing when restarting the program.
That's where Meilisearch's dumpless upgrade is excellent: all the content you've previously indexed is still written to disk and slightly modified to be compatible with the latest engine version. This differs from Typesense, where upgrades necessitate reindexing the documents in memory. I don't know about embeddings. Do you have to query OpenAI again when upgrading? Meilisearch keeps the embeddings on disk to avoid costs and remove the indexing time.
Thank you for the response here. Not being able to upgrade the machine without completely re-indexing has actually become a huge issue for me. My use case is that I need to upgrade the machine to perform a big indexing operation that happens all at once and then after that reduce the machine resources. Typesense has future plans to persist the index to disk but it's not on the road map yet. And with the indexing improvements, Meilisearch may be a viable option for my use case now. I'll be checking this out!
> I'm still looking for a systematic approach to make a hybrid search (combined full-text with embedding vectors).
You know that Meilisearch is the way to go, right? Tantivy, even though, I love the product, doesn't support vector search. Its Hybrid search is stunningly good. You can try it on our demo [1].
Perhaps on a technical level, but for a dev, if I just need to install Postgres and some plugins and, boom, I have a full searchable index, it's even easier
Meilisearch decided to use hybrid search and avoid fusion ranking. We plan to work on reranking soon, but as far as I know, our hybrid search is so good that nobody asked for reranking. You can read more about our Hybrid search in our blog post [1].
About streaming ingestion support. Meilisearch support basic HTTP requests and is capable of batching task to index them faster. In v1.12 [2], we released our new indexer version that is much faster, leverages high usage of parallel processing, and reduces disk writes.
I'm a little confused by your statement that "Meilisearch decided to use hybrid search and avoid fusion ranking" when your website [1] says "Hybrid search re-ranking: The final step involves re-ranking results from both retrieval methods using the Reciprocal Rank Fusion (RRF) algorithm."
Can you clarify what you mean by "fusion ranking"?
All hybrid search requires a method to blend keyword and vector search results. RRF is one approach, and cross-encoder-based rerankers is another.
hello, I implemented hybrid search in Meilisearch.
Whether it uses re-ranking or not depends on how you want to stretch the definition. Meilisearch does not use the rank of the documents in each list of results to compute the final list of results.
Rather, Meilisearch attributes a relevancy score to each result and then orders the results in the final list by comparing the relevancy score of the documents in each list of results.
This is usually much better than any method that uses the rank of the documents, because the rank of a document doesn't tell you if the document is relevant, only that it is more relevant than documents that ranked after it in that list of hits. As a result, these methods tend to mix good and bad results. As semantic and full-text search are complementary, one method is best for some queries and the other for different queries, and taking results by only considering their rank in their respective list of results is really bizarre to me.
I gather other search engines might be doing it that way because they cannot produce a comparable relevancy score for both the full-text search results and the semantic search results.
I'm not sure why the website mentions Reciprocal Rank Fusion (RRF) (I'm just a dev, not in charge of this particular blog article), but it doesn't sound right to me. Maybe something got lost in translation. I'll try and have it fixed. EDIT: Reported, this is being fixed.
By the way, this way of comparing scores from multiple lists of results generalizes very well, which is how Meilisearch is able to provide its "federated search" feature, which is quite unique across search engines, I believe.
Federated search allows comparing the results of multiple queries against possibly multiple indexes or embedders.
That would also be a good fit for [the new delta-encoded posting lists I am working on](https://github.com/meilisearch/meilisearch/pull/5985). Let's see how good it can get. My early benchmarks showed a 50% reduction in disk usage.