Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You only have to update the metadata, not do a full reindex.




You'd have to reindex the metadata (roles access), which may be substantial if you have a complex enough schema with enough users/roles.

> You'd have to reindex the metadata (roles access), which may be substantial if you have a complex enough schema with enough users/roles.

Right, but this compare this to the original proposal:

> A basic implementation will return the top, let's say 1000, documents and then do the more expensive access check on each of them

Using an index is much better than that.

And it should be possible to update the index without a substantial cost, since most of the 100000 documents likely aren't changing their role access very often. You only have to reindex a document's metadata when that changes.

This is also far less costly than updating the actual content index (the vector embeddings) when the document content changes, which you have to do regardless of your permissions model.


I don't understand how "using an index" is a solution to this problem. If you're doing search, then you already have an index.

If you use your index to get search results, then you will have a mix of roles that you then have to filter.

If you want to filter first, then you need to make a whole new search index from scratch with the documents that came out of the filter.

You can't use the same indexing information from the full corpus to search a subset, your classical search will have undefined IDF terms and your vector search will find empty clusters.

If you want quality search results and a filter, you have to commit to reindexing your data live at query time after the filter step and before the search step.

I don't think Elastic supports this (last time I used it it was being managed in a bizarre way, so I may be wrong). Azure AI Search does this by default. I don't know about others.


> I don't understand how "using an index" is a solution to this problem. If you're doing search, then you already have an index

It's a separate index.

You store document access rules in the metadata. These metadata fields can be indexed and then use as a pre-filter before the vector search.

> I don't think Elastic supports this

https://www.elastic.co/docs/solutions/search/vector/knn#knn-...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: