Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Wow, this is really fascinating. I guess it all comes down to how it's doing select and rank in constant time, which is probably some clever bit arithmetic. I'll have to look into how that works.



I can speak a bit about one of the approaches: "Practical Entropy-Compressed Rank/Select Dictionary" by Daisuke Okanohara and Kunihiko Sadakane. This presents two different variants: One for dense (i.e. more than 50% of the bits are set) and one for sparse.

The dense implementation is basically based around partitioning them into "blocks" of a given size and then you can find the block by doing `block[idx / block_size]`. It then also groups each block into sub-blocks which helps you even further. All of these are additional data structures (very much like an index) which are stored next to the regular bitset. You use the blocks/sub-blocks to find roughly where in the bitset you are and then use a algorithm for finding the value in a given machine-word.

The sparse implementation treats the bitset as an ordered list of numbers (e.g. 100001001 is interpreted as 0, 5, 8) and then it stores those numbers using Elias-Fano encoding. The higher bits of the Elias-Fano encoding happen to be dense and hence we can use the previous dense implementation on that higher bits and then combine it with the lower bits.

I'm also aware of https://arxiv.org/abs/1706.00990 which is more about how to do this most efficiently at a machine-word level.

"Engineering Compact Data Structures for Rank and Select Queries on Bit Vectors" is another quite recent paper which I haven't fully digested yet.


Some of it is moving or has moved down to the instruction sets: https://vaibhavsagar.com/blog/2019/09/08/popcount/




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: