Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Here's my use case for it: I want really small filters (say 64 bits or 128 bits), and use these to represent 50-100 element subsets of a ~1000-2000 element set.

I know with a traditional bloom filter this would give me a lot of false positives. Is there an alternative that would fare better ?



Well, what problem do you want to solve? What you describe is not a use case but a possible solution... this smells like an xy problem...


The problem is:

Represent (possibly overlapping, hence trees are tricky) subsets of size 50-100 out of a set of size of size 1000-2000.

Some amount of false positives is acceptable but not like 50%.


So, for this it sounds like you only need one Bloom filter (not multiple), and each subset is an _entry_ in the filter. The total set size doesn't matter; what matters (for the size of the Bloom filter) is the total number of entries you put into the Bloom filter, and the false positive rate. And then, you can do a membership test (with a configurable false positive rate, typical is 1%), to find out if an entry is in the set. BTW you can not use Bloom filters to store and retrieve entries: for that, you need something else (an array, or hash map, or a retrieval data structure; Bloom filter can't be used).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: