More

Kerollmops · 2025-12-21T10:20:50 1766312450

So nice! That's an excellent extract and looks useful for benchmarking Meilisearch. I'll probably spend my Christmas holidays importing the tracks, albums, and artists into Meilisearch, while my CEO builds a beautiful front-end for it. I'll probably replace [the current music search demo](https://music.meilisearch.com) we have with this much higher-quality dataset!

That would also be a good fit for [the new delta-encoded posting lists I am working on](https://github.com/meilisearch/meilisearch/pull/5985). Let's see how good it can get. My early benchmarks showed a 50% reduction in disk usage.

Kerollmops · on April 15, 2025

Someone reported it, and I answered today [1]. It's a rule that is too hard on the front end, and we will fix it by using a better Hybrid search setup (not only semantic). Thank you for the report.

[1]: https://github.com/meilisearch/meilisearch/issues/5504#issue...

Kerollmops · on April 15, 2025

You should try Meilisearch then, you'll be astonished by the quality of the results and the ease of setup.

yencabulator · on April 15, 2025

https://news.ycombinator.com/user?id=Kerollmops

> Meilisearch Co-Founder and Tech Lead.

You really should disclose your affiliation.

Kerollmops · on April 15, 2025

Right. We released a lot of new versions of the engine to improve the indexing part of it. V1.12 is improving the document indexing a lot! Have you tried the latest version v1.14 we released yesterday?

While Meilisearch is capable of limiting it's resident (actual mallocs) memory. However, it requires a bare minimum (about 1GiB).

Kerollmops · on April 15, 2025

Meilisearch is faster when you reduce the dataset by filtering it. I wrote an article on this subject [1].

[1]: https://blog.kerollmops.com/meilisearch-vs-qdrant-tradeoffs-...

andre-z · on April 15, 2025

"Slowness can arise from a misconfigured index or if filterable attributes aren't listed." ;)

Kerollmops · on April 14, 2025

35 GiB is probably a third of the data I index into Meilisearch just for experimenting and don't forget about the inverted indexes. You wouldn't use any O(n) algorithm to search in your documents.

Also, every time you need to reboot the engine you would have to reindex everything from scratch. Not a good strategy, believe me.

Kerollmops · on April 14, 2025

The best you could do is put Meilisearch on a very good NVMe. I am indexing large streams of content (Bsky posts + likes), and I assure you that I tested Meilisearch on a not-so-good NVMe and a slow HDD — and ho, Boy!! The SSD is so much faster.

I am sending hundreds of thousands of messages and changes (of the likes count) into Meilisearch, and so far, so good. It's been a month, and everything is working fine. We also shipped the new batches/ stats showing a lot of internal information about indexing step timings [1] to help us prioritize.

[1]: https://github.com/meilisearch/meilisearch/pull/5356#issue-2...

Kerollmops · on April 14, 2025

Meilisearch just improved the indexing speed and simplified the update path. We released v1.12 and highly improved indexing speed [1]. We improved the upgrade path with the dumpless upgrade feature [2].

The main advantage of Meilisearch is that the content is written to disk. Rebooting an instance is instant, and that's quite useful when booting from a snapshot or upgrading to a smaller or larger machine. We think disk-first is a great approach as the user doesn't fear reindexing when restarting the program.

That's where Meilisearch's dumpless upgrade is excellent: all the content you've previously indexed is still written to disk and slightly modified to be compatible with the latest engine version. This differs from Typesense, where upgrades necessitate reindexing the documents in memory. I don't know about embeddings. Do you have to query OpenAI again when upgrading? Meilisearch keeps the embeddings on disk to avoid costs and remove the indexing time.

[1]: https://github.com/meilisearch/meilisearch/releases/tag/v1.1... [2]: https://github.com/meilisearch/meilisearch/releases/tag/v1.1...

kk3 · on April 15, 2025

Thank you for the response here. Not being able to upgrade the machine without completely re-indexing has actually become a huge issue for me. My use case is that I need to upgrade the machine to perform a big indexing operation that happens all at once and then after that reduce the machine resources. Typesense has future plans to persist the index to disk but it's not on the road map yet. And with the indexing improvements, Meilisearch may be a viable option for my use case now. I'll be checking this out!

Kerollmops · on April 14, 2025

> I'm still looking for a systematic approach to make a hybrid search (combined full-text with embedding vectors).

You know that Meilisearch is the way to go, right? Tantivy, even though, I love the product, doesn't support vector search. Its Hybrid search is stunningly good. You can try it on our demo [1].

[1]: https://wheretowatch.meilisearch.com/

oulipo · on April 14, 2025

why couldn't it be possible to just embed Meilisearch/Tantivy/Quickwit inside Postgres as a plugin to simplify the setup?

Kerollmops · on April 14, 2025

> [..] to simplify the setup?

It would be simpler to keep Meilisearch and its key-value store out of Postgres' WAL and stuff and better propose a good SQL exporter (in the plan).

oulipo · on April 15, 2025

Perhaps on a technical level, but for a dev, if I just need to install Postgres and some plugins and, boom, I have a full searchable index, it's even easier

Kerollmops · on April 14, 2025

Meilisearch decided to use hybrid search and avoid fusion ranking. We plan to work on reranking soon, but as far as I know, our hybrid search is so good that nobody asked for reranking. You can read more about our Hybrid search in our blog post [1].

About streaming ingestion support. Meilisearch support basic HTTP requests and is capable of batching task to index them faster. In v1.12 [2], we released our new indexer version that is much faster, leverages high usage of parallel processing, and reduces disk writes.

[1]: https://www.meilisearch.com/blog/hybrid-search [2]: https://github.com/meilisearch/meilisearch/releases/tag/v1.1...

searchguy · on April 15, 2025

I'm a little confused by your statement that "Meilisearch decided to use hybrid search and avoid fusion ranking" when your website [1] says "Hybrid search re-ranking: The final step involves re-ranking results from both retrieval methods using the Reciprocal Rank Fusion (RRF) algorithm."

Can you clarify what you mean by "fusion ranking"?

All hybrid search requires a method to blend keyword and vector search results. RRF is one approach, and cross-encoder-based rerankers is another.

[1]: https://www.meilisearch.com/blog/hybrid-search

dureuill · on April 15, 2025

hello, I implemented hybrid search in Meilisearch.

Whether it uses re-ranking or not depends on how you want to stretch the definition. Meilisearch does not use the rank of the documents in each list of results to compute the final list of results.

Rather, Meilisearch attributes a relevancy score to each result and then orders the results in the final list by comparing the relevancy score of the documents in each list of results.

This is usually much better than any method that uses the rank of the documents, because the rank of a document doesn't tell you if the document is relevant, only that it is more relevant than documents that ranked after it in that list of hits. As a result, these methods tend to mix good and bad results. As semantic and full-text search are complementary, one method is best for some queries and the other for different queries, and taking results by only considering their rank in their respective list of results is really bizarre to me.

I gather other search engines might be doing it that way because they cannot produce a comparable relevancy score for both the full-text search results and the semantic search results.

I'm not sure why the website mentions Reciprocal Rank Fusion (RRF) (I'm just a dev, not in charge of this particular blog article), but it doesn't sound right to me. Maybe something got lost in translation. I'll try and have it fixed. EDIT: Reported, this is being fixed.

By the way, this way of comparing scores from multiple lists of results generalizes very well, which is how Meilisearch is able to provide its "federated search" feature, which is quite unique across search engines, I believe.

Federated search allows comparing the results of multiple queries against possibly multiple indexes or embedders.