HAMT: Hash Array Mapped Trie. This data structure makes efficient immutable data...

robalni · on July 22, 2022

What I like about HAMTs is that they can be super simple to implement if you make them one bit per level. They are like a combination of a binary search tree and hash table but without any of their annoyances.

* In binary search trees, you need to balance the tree every time you insert something because the tree will be linear if you insert the nodes in order. In a HAMT the positions are determined by the hash, so the positions will be random-like and not depend on the order you insert the nodes in.

* In binary search trees, you need to perform some complicated steps to remove a node. In a HAMT, you can either just mark the node as deleted, or if you want to fill the gap, find a leaf node that has the removed node on its path and just put it there.

* In hash tables, when it starts to get full you need to rehash it or put the added item in the "wrong" slot or something. A HAMT never gets full because it's a tree so you just add another child node.

Here I have written an explanation of a simple HAMT: https://www.robalni.org/posts/20220507-a-hash-table-you-dont...

robalni · on July 22, 2022

Here is an implementation of a binary HAMT that allows you to add and find nodes:

https://gist.github.com/robalni/311afd0756f25c4f234b2ae332cd...

sfvisser · on July 22, 2022

Another interesting approach to copy-on-write for immutable collections (for example arrays) is where you actively mutate the array in place, but leave the original as a lazy description/view of how to get back to the before state.

From the outside the effect is the same, but the performance is optimized for accessing the updated collection and only possibly using the old value.

Great for cases where you want the immutability guarantee, but where it might be unlikely the old value is actually going to be used in the general case.

ignoramous · on July 22, 2022

See also this cppcon talk by Juan Pedro https://youtu.be/sPhpelUfu8Q

seunosewa · on July 22, 2022

Databases do that.

ksbrooksjr · on July 22, 2022

If the Record & Tuple proposal advances to stage 3 we'll finally have native immutable data structures in JS [1].

[1] https://github.com/tc39/proposal-record-tuple

jitl · on July 22, 2022

I have beeen waiting aeons for this, and suspect will need to continue waiting effectively forever :(

ksbrooksjr · on July 22, 2022

Yeah TC39 has a habit of letting proposals languish at stage 2 for years. I wouldn't give up hope though. The decorators proposal finally made it to stage 3 after spending an eternity at stage 2. According to the slides from the TC39 meeting this month [1][2], it looks like they'll be proposing an advancement to stage 3 in September.

[1]https://github.com/tc39/agendas/blob/main/2022/07.md [2]https://www.dropbox.com/s/g4enjgd4p2npv2s/record_tuple_updat...

mkirchner · on July 22, 2022

This. Roughly a year ago I got interested in efficient immutability for my write-from-scratch-in-C Lisp [0] and started to write a HAMT implementation in C [1], along with a (somewhat hacky, you have been warned) benchmarking suite [2].

The docs are only 70% done (in particular the "putting it all together" part is missing) but it has been a really interesting and enlightening journey so far and can only recommend embarking on this path to everyone.

[0]: https://github.com/mkirchner/stutter [1]: https://github.com/mkirchner/hamt [2]: https://github.com/mkirchner/hamt-bench

ignoramous · on July 22, 2022

See also Matt Bierner's HAMT impl in JavaScript: https://blog.mattbierner.com/hash-array-mapped-tries-in-java...

miclill · on July 22, 2022

also Ctrie: "a concurrent thread-safe lock-free implementation of a hash array mapped trie": https://en.wikipedia.org/wiki/Ctrie

So basically like HAMT but lock free.

CSMastermind · on July 22, 2022

Isn't this what Sublime Text uses under the hood to give it such good performance?

WithinReason · on July 22, 2022

Isn't this slow because of pointer chasing in the trie?

mklarmann · on July 22, 2022

I do recall Asami using a HAMT and it is written in Clojure. [1]: https://github.com/threatgrid/asami

jahewson · on July 22, 2022

That’s what Immutable.js used under the hood.

akra · on July 23, 2022

As a case study we evaluated the performance of this along with other languages that use immutable data structures (perf testing) for a recent production app we were building. After some evaluation we had the opinion that the penalty of using immutable HAMT's on the Node/JS platform is just too high, at least in its current lib form. Immutable structures are often somewhat slower, but the relative penalty of using a HAMT on the Node/JS platform vs a straight mutable structure was much higher than other languages. Floats for numbers, conversions back and forth implicitly, pop count intrinsic missing, conversions in and out, and other such overheads make the relative penalty of an immutable HAMT vs a mutable JS map much higher compared to other languages such as .NET or Java let alone C/Rust.

Given Node is mostly single threaded as well some of the advantages (not all!) of immutability diminish on that platform making the performance penalty harder to swallow, at least for my use case.

It of course may not matter to your case, but I find the benefits of these structures come from their quicker clone/copy time in many applications, snapshot like characteristics, easy large diff's, etc. For this to matter the collection has to be of a decent size where a linear copy/equals/etc is too slow to work for the use case. At that point speed starts to matter as the costs of random lookup start to grow/drift further away from a standard mutable map and for JS we deemed the penalty too high and decided on another backend platform amenable to writing generic data structures.

filoeleven · on July 22, 2022

Yes, I evaluated it. The complexity of knowing where to convert from/to plain JS, plus the extra library syntax to learn, plus the performance cost of toJS, made it a poor fit for my particular use case. Nearly as much of a hard sell at work as saying “let’s rebuild the UI in Clojurescript,” and without providing as much benefit.

My use case is pretty atypical though, and it’s worth checking out if you have more reliable data shapes/update paths.

bhl · on July 22, 2022

Does immer have the same drawbacks as immutable? It uses proxies as opposed to hased array map tries.

filoeleven · on July 22, 2022

The key design difference between the two is that immutable.js wants you to keep your stuff in its format (and operate on your stuff with its functions) until you hit some piece of code that needs a plain JS object. Whereas immer is scoped (mostly?) to update functions, and always returns plain JS. Immer seems to be designed to simplify the problem of only changing references to changed objects—and it looks like it does a good job of it.

So with immutable.js, you get more powerful data manipulation throughout the app, at the cost of having to know when and where to convert things back to plain JS. With immer, you get tightly-scoped immutable updates to objects, and inside the update function you can treat them as if they’re mutable, letting you write more idiomatic-looking JS. Instead of spreading 4 layers of objects to update one state flag, immer lets you say:

  const nextState = produce(state, draft) => {
    draft.view.marketing.annoyingSubscribePopup = false
  }

Every object reference in that path will be updated, but the rest of “state”, “state.view”, etc. will be unchanged.

If you can keep everything inside of immutable.js, it is the fastest thing out there. As soon as you have to drop back to plain JS, it gets slower. See this performance graph: https://immerjs.github.io/immer/performance/

Thanks for reminding me of this. We finally dropped IE11 support, so may be able to get some benefits from introducing immer either by itself or by bringing in Redux Toolkit.

WatchDog · on July 22, 2022

I imagine careless use of such a structure would be an easy way to create a memory leak. Is it possible to create persistent collections in js, that will free data no longer directly referenced?

filoeleven · on July 22, 2022

The careless usage of shallow copies of objects (with JS spreading) presents the same issue. Properly scoping assigned constants and variables is still important.

Persistent collections are available today via immutable.js, so it can be done. The catch is that you have to use a library for it, and transform them back into plain JS when something expects an object or an array. The language itself could make this transparent and lower-cost by implementing it at the engine level.

Persistent collections are primarily useful in functional programming paradigms. JS is a multi-paradigmatic language, so it doesn’t make sense to use them as the default. It would sure be nice to be able to opt into using them in FP-aligned frameworks like the current React ecosystem though.

dan-robertson · on July 22, 2022

Relatedly, RRB trees for immutable vectors with good constant factors.