Here's a few fun ones: the Burrows–Wheeler transform, suffix arrays in general, ...

eru · on July 22, 2022

> Much better advice is found in The Practice of Programming: almost all programs can be written without any data structures but arrays, hash tables, linked lists, and, for things like parsing, symbolic algebra, or filesystems, trees. So just use those if you can.

In general, as much as possible use stuff you already have good libraries for.

Often, you can get away with much simpler data structures with a bit of clever pre-sorting (or sometimes: random shuffling). Relatedly, batching your operations can also often make your data structure needs simpler.

kragen · on July 22, 2022

Those are good tips.

I'm not that enthusiastic about libraries. Yes, using a good LSM-tree library will keep you and your successors from spending holiday weekends debugging your LSM-tree implementation — probably, because "good" doesn't mean "perfect". But it won't be a tenth as fast as an in-RAM hash table, if an in-RAM hash table can do the job. And CPython's standard hash table usually won't be a tenth as fast as a domain-specific hash table optimized for your needs.

That doesn't always matter very much, but it does always matter. Especially now with the advent of good property-based testing libraries like Hypothesis, if you can run a function call in 0.1 ms instead of 1.0 ms, you can do ten times as much testing with a given amount of resources.

eru · on July 25, 2022

I like Hypothesis just as much as the next guy. It's awesome.

You have some valid points, when you have specialised needs, you might need to write specialised software.

Though even for your example, I would suggest you write your domain-specific hash table as if it was a library, if possible, (instead of embedding it deep in your application code). Trying to make your code testable with hypothesis strongly encourages such an approach anyway.

(To show that my advice ain't trivial, I give a counterexample where you can't do this as easily: C folks sometimes write things like intrusive linked lists, or intrusive data structures in general. Almost no language gives you good tools to write these as a library, so testing properties in isolation is hard to do, too.)

kragen · on July 25, 2022

I agree!