A whole load of data structures, like bloom filters, are 'efficiency tricks'.
For example, "check if we might have Mr Smith's data cached by looking in this bloom filter". As Long as the answer is usually right, that's good enough.
I do wonder if in the future we'll use a mini online-trained AI to achieve the same thing. "Ask the AI if we have MR Smiths data cached".
Rather like you might ask an office clerk "Is Mr smith that troublesome customer you were dealing with last week? Do you still have his file on your desk?"
The way that would work is that the LLM would translate that sentence into a tool call, which would query a data store that does the heavy lifting. Also, there is an ML task called "Learning to hash", which is about optimizing the hash for the task: https://learning2hash.github.io/
I would imagine a heuristic to keep enough of the filter in L2 cache to avoid pipeline stalls might be useful. Sort of a double bloom, but weighted for common lookups.
For example, "check if we might have Mr Smith's data cached by looking in this bloom filter". As Long as the answer is usually right, that's good enough.
I do wonder if in the future we'll use a mini online-trained AI to achieve the same thing. "Ask the AI if we have MR Smiths data cached".
Rather like you might ask an office clerk "Is Mr smith that troublesome customer you were dealing with last week? Do you still have his file on your desk?"