Hacker News new | past | comments | ask | show | jobs | submit login

I'm very sceptical that in-mem B Trees can beat hashes, given their huge size overhead and cache unfriendlyness



Quote the readme: "B-trees were originally invented in the 1970s as a data structure for slow external storage devices. As such, they are strongly optimized for locality of reference: they prefer to keep data in long contiguous buffers and they keep pointer derefencing to a minimum. (Dereferencing a pointer in a B-tree usually meant reading another block of data from the spinning hard drive, which is a glacially slow device compared to the main memory.)"

Sounds like it could be pretty cache friendly. Besides, a B-tree can be used to implement a hash/dict/map.


I know b-trees. Still patricia trees or the optimized variant judy hashes are more cache friendly, and non fucked-up hashes even more. For OrderedDict it makes sense, but I would still consider judy or patricia better.


Tries are awesome; but they're more specialized. So, as long as you have to choose which one to implement first (and I do), B-trees provide better bang for the buck.


B-trees don't beat hash tables at their own game, but hash tables don't beat B-trees at theirs either. Both are general-use data structures that have their particular niches. The trick is to always select the correct tool for the job.

You might be thinking about red-black trees. The size overhead of B-trees is the same or better than that of hash tables. As for cache friendliness, B-trees were explicitly designed to work great on two-level storage; the advantage is clearly theirs.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: