Here's my use case for it: I want really small filters (say 64 bits or 128 bits)...

thomasmg · 2025-05-02T17:28:42 1746206922

Well, what problem do you want to solve? What you describe is not a use case but a possible solution... this smells like an xy problem...

fooker · 2025-05-02T22:12:36 1746223956

The problem is:

Represent (possibly overlapping, hence trees are tricky) subsets of size 50-100 out of a set of size of size 1000-2000.

Some amount of false positives is acceptable but not like 50%.

thomasmg · 2025-05-03T12:49:48 1746276588

So, for this it sounds like you only need one Bloom filter (not multiple), and each subset is an _entry_ in the filter. The total set size doesn't matter; what matters (for the size of the Bloom filter) is the total number of entries you put into the Bloom filter, and the false positive rate. And then, you can do a membership test (with a configurable false positive rate, typical is 1%), to find out if an entry is in the set. BTW you can not use Bloom filters to store and retrieve entries: for that, you need something else (an array, or hash map, or a retrieval data structure; Bloom filter can't be used).