I am enamored by data structures in the sketch/summary/probabilistic family: t-digest[1], q-digest[2], count-min sketch[3], matrix-sketch[4], graph-sketch[5][6], Misra-Gries sketch[7], top-k/spacesaving sketch[8], &c.
What I like about them is that they give me a set of engineering tradeoffs that I typically don't have access to: accuracy-speed[9] or accuracy-space. There have been too many times that I've had to say, "I wish I could do this, but it would take too much time/space to compute." Most of these problems still work even if the accuracy is not 100%. And furthermore, many (if not all of these) can tune accuracy to by parameter adjustment anyways. They tend to have favorable combinatorial properties ie: they form monoids or semigroups under merge operations. In short, a property of data structures that gave me the ability to solve problems I couldn't before.
I hope they are as useful or intriguing to you as they are to me.
9. It may better be described as error-speed and error-space, but I've avoided the term error because the term for programming audiences typically evokes the idea of logic errors and what I mean is statistical error.
What I like about them is that they give me a set of engineering tradeoffs that I typically don't have access to: accuracy-speed[9] or accuracy-space. There have been too many times that I've had to say, "I wish I could do this, but it would take too much time/space to compute." Most of these problems still work even if the accuracy is not 100%. And furthermore, many (if not all of these) can tune accuracy to by parameter adjustment anyways. They tend to have favorable combinatorial properties ie: they form monoids or semigroups under merge operations. In short, a property of data structures that gave me the ability to solve problems I couldn't before.
I hope they are as useful or intriguing to you as they are to me.
1. https://github.com/tdunning/t-digest
2. https://pdsa.readthedocs.io/en/latest/rank/qdigest.html
3. https://florian.github.io/count-min-sketch/
4. https://www.cs.yale.edu/homes/el327/papers/simpleMatrixSketc...
5. https://www.juanlopes.net/poly18/poly18-juan-lopes.pdf
6. https://courses.engr.illinois.edu/cs498abd/fa2020/slides/20-...
7. https://people.csail.mit.edu/rrw/6.045-2017/encalgs-mg.pdf
8. https://www.sciencedirect.com/science/article/abs/pii/S00200...
9. It may better be described as error-speed and error-space, but I've avoided the term error because the term for programming audiences typically evokes the idea of logic errors and what I mean is statistical error.