Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You should be careful when using asymptotic bounds with more precision than about O(sqrt(n)). The bounds ignore constant factors, and the constant factors could be more significant than slowly growing non-constant factors for reasonable values of n.

It's also very common in algorithm design that improving the asymptotic bounds and making the algorithm faster (or the data structure smaller) are orthogonal (or even opposite) goals. Real computers have complex performance characteristics and fixed word lengths, and it's rarely a good idea to implement a theoretical algorithm exactly as described.

Succinct data structures often have a number of internal parameters. In a theoretical parameterization, those parameters may be described as being O(log n), O(log^2 n), or O(log log n). In a concrete implementation, it may be a good idea to use constant approximations for some nontrivial (such as 2x) performance gains. O(log n) could become 32 or 64, O(log^2 n) could become 1024 (or a power-of-2 multiple), and O(log log n) could become 4 or 8.

And then, if you parameterize the succinct data structure with these constants, the space overhead becomes a constant fraction.




Succinct data structures require the extra space (above the information-theoretical minimum) to be an additive term of o(n) bits (little O, not big O). That just means that the extra space grows more slowly than n, as n approaches infinity, so their ratio (extra space)/n approaches 0 in the limit.


That's a simplification. Succinct data structures are usually parameterized data structures. They have a number of parameters, such as block sizes and sampling rates, that govern the space overhead and query performance. The published version may use a parameterization that makes the space overhead sublinear while guaranteeing attractive asymptotic bounds for worst-case performance. But if you actually implement that, the performance is likely terrible.

Consider the rank data structure for bitvectors that is supposed to guarantee O(1) time queries with o(n) bits of space overhead. The bitvector is divided into superblocks of b1 bits and blocks of b2 bits. The top-level index, which stores the rank up to each superblock, uses O(n log n / b1) bits of space. The second-level index, which stores the rank within the superblock up to each block, uses O(n log b1 / b2) bits of space. And then you need to do linear search within the block, which is typically assumed to take O(b2 / log n) or O(b2 / w) time, where w is word size. If you choose b1 = log^2 n and b2 = log n, you get O(1) time with O(n log log n / log n) bits of space overhead. Which is technically sublinear but effectively indistinguishable from linear with realistic values of n.

Real-world implementations use constant values for the parameters. Typically the goal is to get the blocks and indexes align with cache lines and to replace arbitrary divisions with divisions by compile-time power-of-2 constants. Some values I've seen are (b1 = 512, b2 = 64) and (b1 = 65536, b2 = 512). In both cases, the overhead is linear.

And sometimes the implemented data structure is a simplification, because the nominally succinct data structure is too large and slow. For example, it's rare to see actual O(1)-time implementations of select on bitvectors. That would require three levels of indexes with many special cases. It's more common to use two levels of indexes, with queries that almost always constant-time but have (poly)logarithmic worst cases with adversarial inputs.


This is really good information! Thanks for writing it up.

Honestly, I never actually "trust" the complexity analysis. Whenever I find a paper I immediately look for their specific results on an experiment, and if I can't find that I will assume the paper is only of theoretical interest. There's of course still a lot to learn from a purely theoretical paper, but I've always ended up being disappointed when I've implemented something which only had a "good" asymptotic bounds.


You're absolutely right and my response completely missed your point, thanks for clarifying further.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: