> Yes, I experienced this too. I from 1536 to 256 and did not try more values th...

> Yes, I experienced this too. I from 1536 to 256 and did not try more values than I'd have liked because spinning up a new database and recreating the embeddings simply took too long. I’m glad it worked well enough for me, but without a quick way to experiment with these hyperparameters, who knows whether I’ve struck the tradeoff at the right place.

Yeah, this is totally a concern. You can mitigate this to some extent by testing on a representative sample of your dataset.

> Curious if you have any good solutions in this respect.

Most vector store providers have some facility to import data from object storage (e.g. s3) in bulk, so you can periodically export all your data from your primary data store, then have a process grab the exported data, transform it into the format your vector store wants, put it in object storage, and then kick off a bulk import.

> I realize they market heavily on this, but for open source databases, wouldn't the fact that you can see the source code make it easier to reason about this? or is your point that their implementation here are all custom and require much more specialized knowledge to evaluate effectively?

This is definitely a selling point for any open-source solution, but there are lots of dedicated vector store which are not open source and so it is hard to really know how their filtering algorithms perform at scale.