Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, 30ns means it's in cache. But bloom filters are surprisingly fast for the amount of lookups they do, because they all happen in parallel and there is a lot of parallelism in modern memory subsystems, so that you essentially only pay the cost of a single random read for the entire lookup. If using 1GB pages, you can still realistically talk about <100ns lookups.



> so that you essentially only pay the cost of a single random read for the entire lookup

Why would you ever pay more than that for a bloom filter lookup? I mean, I don’t see how that has anything to do with parallelism in memory subsystems. But I may be missing something.


A bloom filter needs to multiple loads from different memory locations for each single lookup. (7, in the example 1.2GB filter.) But unlike, say, with a tree, it knows all the addresses after computing the hashes, without having to wait for results from the previous loads. So it can start all of them in parallel.


Yeah, sorry. I was thinking of a single hash function bloom filter, which is of course the exception. My bad.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: