Wow, this is really fascinating. I guess it all comes down to how it's doing select and rank in constant time, which is probably some clever bit arithmetic. I'll have to look into how that works.
I can speak a bit about one of the approaches: "Practical Entropy-Compressed Rank/Select Dictionary" by Daisuke Okanohara and Kunihiko Sadakane. This presents two different variants: One for dense (i.e. more than 50% of the bits are set) and one for sparse.
The dense implementation is basically based around partitioning them into "blocks" of a given size and then you can find the block by doing `block[idx / block_size]`. It then also groups each block into sub-blocks which helps you even further. All of these are additional data structures (very much like an index) which are stored next to the regular bitset. You use the blocks/sub-blocks to find roughly where in the bitset you are and then use a algorithm for finding the value in a given machine-word.
The sparse implementation treats the bitset as an ordered list of numbers (e.g. 100001001 is interpreted as 0, 5, 8) and then it stores those numbers using Elias-Fano encoding. The higher bits of the Elias-Fano encoding happen to be dense and hence we can use the previous dense implementation on that higher bits and then combine it with the lower bits.