I was the first employee at a company which uses RAG (Halcyon), and I’ve been wo...

ichiwells · 2025-03-09T13:53:29 1741528409

> Finally, building HNSW indices in Postgres is still extremely slow (even with parallel index builds), so it is difficult to experiment with index hyperparameters at scale

For anyone coming across this without much experience here, for building these indexes in pgvector it makes a massive difference to increase your maintenance memory above the default. Either as a separate db like whakim mentioned, or for specific maintenance periods depending on your use case.

``` SHOW maintenance_work_mem; SET maintenance_work_mem = X; ```

In one of our semantic search use cases, we control the ingestion of the searchable content (laws, basically) so we can control when and how we choose to index it. And then I've set up classic relational db indexing (in addition to vector indexing) for our quite predictable query patterns.

For us that means our actual semantic db query takes about 10ms.

Starting from 10s of millions of entries, filtered to ~50k (jurisdictionally, in our case) relevant ones and then performing vector similarity search with topK/limit.

Built into our ORM and zero round-trip latency to Pinecone or syncing issues.

EDIT: I imagine whakim has more experience than me and YMMV, just sharing lesson learned. Even with higher maintenance mem the index building is super slow for HNSW

whakim · 2025-03-09T23:30:02 1741563002

Thanks for sharing! Yes, getting good performance out of pgvector with even a trivial amount of data requires a bit of understanding of how Postgres works.

zxt_tzx · 2025-03-09T10:08:08 1741514888

Thank you for the comment, compared to you I have only touched the bare surface of this quite complex domain, would love to get more of your input!

> building HNSW indices in Postgres is still extremely slow (even with parallel index builds), so it is difficult to experiment with index hyperparameters at scale.

Yes, I experienced this too. I from 1536 to 256 and did not try more values than I'd have liked because spinning up a new database and recreating the embeddings simply took too long. I’m glad it worked well enough for me, but without a quick way to experiment with these hyperparameters, who knows whether I’ve struck the tradeoff at the right place.

Someone on Twitter reached out and pointed out one could quantizing the embeddings to bit vectors and search with hamming distance — supposedly the performance hit is actually very negligible, especially if you add a quick rescore step: https://huggingface.co/blog/embedding-quantization

> But (as mentioned in other comments) keeping your data in sync is a huge issue.

Curious if you have any good solutions in this respect.

> The other challenge I’ve found is that filtering is often the “special sauce” that vector store providers bring to the table, so it’s pretty difficult to reason about the performance and recall of various types of filters.

I realize they market heavily on this, but for open source databases, wouldn't the fact that you can see the source code make it easier to reason about this? or is your point that their implementation here are all custom and require much more specialized knowledge to evaluate effectively?

whakim · 2025-03-09T23:21:51 1741562511

> Yes, I experienced this too. I from 1536 to 256 and did not try more values than I'd have liked because spinning up a new database and recreating the embeddings simply took too long. I’m glad it worked well enough for me, but without a quick way to experiment with these hyperparameters, who knows whether I’ve struck the tradeoff at the right place.

Yeah, this is totally a concern. You can mitigate this to some extent by testing on a representative sample of your dataset.

> Curious if you have any good solutions in this respect.

Most vector store providers have some facility to import data from object storage (e.g. s3) in bulk, so you can periodically export all your data from your primary data store, then have a process grab the exported data, transform it into the format your vector store wants, put it in object storage, and then kick off a bulk import.

> I realize they market heavily on this, but for open source databases, wouldn't the fact that you can see the source code make it easier to reason about this? or is your point that their implementation here are all custom and require much more specialized knowledge to evaluate effectively?

This is definitely a selling point for any open-source solution, but there are lots of dedicated vector store which are not open source and so it is hard to really know how their filtering algorithms perform at scale.

gregw134 · 2025-03-09T06:36:44 1741502204

What would you recommend for billions of embeddings?

whakim · 2025-03-09T23:24:03 1741562643

We've used Pinecone for a while (and it definitely "just worked" up to a certain point), but we're now actively exploring Google's Vertex Search as a potential alternative if some of the pain points can't be mitigated.