HAMT: Hash Array Mapped Trie. This data structure makes efficient immutable data possible. You can update a list of a million items, and keep a reference to the original list, by changing 3 or 4 references and some bytes.
This should replace copy-on-write for scripting languages. I really want to see it in a JS spec soon. There are libraries that can do it, but they add translation penalties and extra steps. I’d complain less if I could use Clojurescript professionally, because it and Clojure are built around them.
What I like about HAMTs is that they can be super simple to implement if you make them one bit per level. They are like a combination of a binary search tree and hash table but without any of their annoyances.
* In binary search trees, you need to balance the tree every time you insert something because the tree will be linear if you insert the nodes in order. In a HAMT the positions are determined by the hash, so the positions will be random-like and not depend on the order you insert the nodes in.
* In binary search trees, you need to perform some complicated steps to remove a node. In a HAMT, you can either just mark the node as deleted, or if you want to fill the gap, find a leaf node that has the removed node on its path and just put it there.
* In hash tables, when it starts to get full you need to rehash it or put the added item in the "wrong" slot or something. A HAMT never gets full because it's a tree so you just add another child node.
Another interesting approach to copy-on-write for immutable collections (for example arrays) is where you actively mutate the array in place, but leave the original as a lazy description/view of how to get back to the before state.
From the outside the effect is the same, but the performance is optimized for accessing the updated collection and only possibly using the old value.
Great for cases where you want the immutability guarantee, but where it might be unlikely the old value is actually going to be used in the general case.
Yeah TC39 has a habit of letting proposals languish at stage 2 for years. I wouldn't give up hope though. The decorators proposal finally made it to stage 3 after spending an eternity at stage 2. According to the slides from the TC39 meeting this month [1][2], it looks like they'll be proposing an advancement to stage 3 in September.
This. Roughly a year ago I got interested in efficient immutability for my write-from-scratch-in-C Lisp [0] and started to write a HAMT implementation in C [1], along with a (somewhat hacky, you have been warned) benchmarking suite [2].
The docs are only 70% done (in particular the "putting it all together" part is missing) but it has been a really interesting and enlightening journey so far and can only recommend embarking on this path to everyone.
As a case study we evaluated the performance of this along with other languages that use immutable data structures (perf testing) for a recent production app we were building. After some evaluation we had the opinion that the penalty of using immutable HAMT's on the Node/JS platform is just too high, at least in its current lib form. Immutable structures are often somewhat slower, but the relative penalty of using a HAMT on the Node/JS platform vs a straight mutable structure was much higher than other languages. Floats for numbers, conversions back and forth implicitly, pop count intrinsic missing, conversions in and out, and other such overheads make the relative penalty of an immutable HAMT vs a mutable JS map much higher compared to other languages such as .NET or Java let alone C/Rust.
Given Node is mostly single threaded as well some of the advantages (not all!) of immutability diminish on that platform making the performance penalty harder to swallow, at least for my use case.
It of course may not matter to your case, but I find the benefits of these structures come from their quicker clone/copy time in many applications, snapshot like characteristics, easy large diff's, etc. For this to matter the collection has to be of a decent size where a linear copy/equals/etc is too slow to work for the use case. At that point speed starts to matter as the costs of random lookup start to grow/drift further away from a standard mutable map and for JS we deemed the penalty too high and decided on another backend platform amenable to writing generic data structures.
Yes, I evaluated it. The complexity of knowing where to convert from/to plain JS, plus the extra library syntax to learn, plus the performance cost of toJS, made it a poor fit for my particular use case. Nearly as much of a hard sell at work as saying “let’s rebuild the UI in Clojurescript,” and without providing as much benefit.
My use case is pretty atypical though, and it’s worth checking out if you have more reliable data shapes/update paths.
The key design difference between the two is that immutable.js wants you to keep your stuff in its format (and operate on your stuff with its functions) until you hit some piece of code that needs a plain JS object. Whereas immer is scoped (mostly?) to update functions, and always returns plain JS. Immer seems to be designed to simplify the problem of only changing references to changed objects—and it looks like it does a good job of it.
So with immutable.js, you get more powerful data manipulation throughout the app, at the cost of having to know when and where to convert things back to plain JS. With immer, you get tightly-scoped immutable updates to objects, and inside the update function you can treat them as if they’re mutable, letting you write more idiomatic-looking JS. Instead of spreading 4 layers of objects to update one state flag, immer lets you say:
Every object reference in that path will be updated, but the rest of “state”, “state.view”, etc. will be unchanged.
If you can keep everything inside of immutable.js, it is the fastest thing out there. As soon as you have to drop back to plain JS, it gets slower. See this performance graph: https://immerjs.github.io/immer/performance/
Thanks for reminding me of this. We finally dropped IE11 support, so may be able to get some benefits from introducing immer either by itself or by bringing in Redux Toolkit.
I imagine careless use of such a structure would be an easy way to create a memory leak.
Is it possible to create persistent collections in js, that will free data no longer directly referenced?
The careless usage of shallow copies of objects (with JS spreading) presents the same issue. Properly scoping assigned constants and variables is still important.
Persistent collections are available today via immutable.js, so it can be done. The catch is that you have to use a library for it, and transform them back into plain JS when something expects an object or an array. The language itself could make this transparent and lower-cost by implementing it at the engine level.
Persistent collections are primarily useful in functional programming paradigms. JS is a multi-paradigmatic language, so it doesn’t make sense to use them as the default. It would sure be nice to be able to opt into using them in FP-aligned frameworks like the current React ecosystem though.
This should replace copy-on-write for scripting languages. I really want to see it in a JS spec soon. There are libraries that can do it, but they add translation penalties and extra steps. I’d complain less if I could use Clojurescript professionally, because it and Clojure are built around them.
https://en.m.wikipedia.org/wiki/Hash_array_mapped_trie