Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I am enamored by data structures in the sketch/summary/probabilistic family: t-digest[1], q-digest[2], count-min sketch[3], matrix-sketch[4], graph-sketch[5][6], Misra-Gries sketch[7], top-k/spacesaving sketch[8], &c.

What I like about them is that they give me a set of engineering tradeoffs that I typically don't have access to: accuracy-speed[9] or accuracy-space. There have been too many times that I've had to say, "I wish I could do this, but it would take too much time/space to compute." Most of these problems still work even if the accuracy is not 100%. And furthermore, many (if not all of these) can tune accuracy to by parameter adjustment anyways. They tend to have favorable combinatorial properties ie: they form monoids or semigroups under merge operations. In short, a property of data structures that gave me the ability to solve problems I couldn't before.

I hope they are as useful or intriguing to you as they are to me.

1. https://github.com/tdunning/t-digest

2. https://pdsa.readthedocs.io/en/latest/rank/qdigest.html

3. https://florian.github.io/count-min-sketch/

4. https://www.cs.yale.edu/homes/el327/papers/simpleMatrixSketc...

5. https://www.juanlopes.net/poly18/poly18-juan-lopes.pdf

6. https://courses.engr.illinois.edu/cs498abd/fa2020/slides/20-...

7. https://people.csail.mit.edu/rrw/6.045-2017/encalgs-mg.pdf

8. https://www.sciencedirect.com/science/article/abs/pii/S00200...

9. It may better be described as error-speed and error-space, but I've avoided the term error because the term for programming audiences typically evokes the idea of logic errors and what I mean is statistical error.



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: