Can someone explain the reasoning for using a HAMT here? I have played around wi...

fsloth · on Sept 7, 2020

HAMT is excellent for a persistent dictionary. Not persistent as in write-to-disk, but as in immutable-yet-small.

riwsky · on Sept 7, 2020

It is not clear from the article that the author needed a persistent data structure.

It’s also, frankly, not clear they‘d benefit from using an HAMT at all: the only measurements they present are of their problem domain, not of any implementations—and many of the asymptotics they cite are also true of hashmaps.

If their initial hashmap approach did have O(N) inserts because they copied the whole map for “immutability” every time, then sure, HAMTs will beat the pants off of it—but did it?

benibela · on Sept 8, 2020

Btw, can it be used to build a persistent dictionary that keeps the key in insertion-order?

masklinn · on Sept 8, 2020

Sure. Map the HAMT to an index into a persistent vector or something e.g. Scala's VectorMap and TreeSeqMap. You could also reuse the old "linked list map" idea, but go through the keys every time e.g. whenever you add a value to the map, add a triple of `(Some(last_key), value, None)` and update the last keys value such that the third item becomes the key of the new item.

benibela · on Sept 11, 2020

>Sure. Map the HAMT to an index into a persistent vector or something e.g. Scala's VectorMap and TreeSeqMap

I do not have a persistent vector, only a HAMT.

Perhaps the HAMT becomes a standard trie if one sets hashcode(i) = i, and then it can be enumerated in ascending key order?

But the double-lookup in two data structures is unnecessary slow. The first HAMT could store (Index, Value) pairs and only use the "vector" for enumeration.

But I was hoping for a single data structure that can do lookups and ordered enumerations.

>. You could also reuse the old "linked list map" idea, but go through the keys every time e.g. whenever you add a value to the map, add a triple of `(Some(last_key), value, None)` and update the last keys value such that the third item becomes the key of the new item.

I have been using a linked list as sole storage so far. The lookup was O(n), which was fast enough with my data, but I wanted to improve it to be more scalable.

But enumerating the list is in reverse, is it not? Going from the newest key to the oldest key. I want it to go from the oldest to the newest key.

In any case it should be better to store pointers to HAMT nodes rather than keys.

masklinn · on Sept 11, 2020

> I do not have a persistent vector, only a HAMT.

Why not get a persistent vector?

> But the double-lookup in two data structures is unnecessary slow.

Not necessarily. You'd have 2xlog(n) and because you wouldn't be storing the data in the HAMT it could be denser / wider and thus better fit in cache. At least that's what happened with Raymond Hettinger's (non-persistent) naturally ordered dict: rather than a sparse array of buckets, the hashmap is a sparse array of indices and a dense array of buckets leading to much better memory density (even more so with an adaptive sparse array where the stored indices can be u8, u16 or usize depending on the number of items: most maps will only need a sparse array of u8 leading to even better memory behaviour for the most problematic part).

> But enumerating the list is in reverse, is it not? Going from the newest key to the oldest key. I want it to go from the oldest to the newest key.

Not if you have a doubly linked list, but that increases your costs in storage and update complexity

> In any case it should be better to store pointers to HAMT nodes rather than keys.

If you're implementing the entire datastructure then yes, I was rather assuming you were on the outside with an existing HAMT.

benibela · on Sept 11, 2020

>Why not get a persistent vector?

I have not read about them. Might take a while to implement.

I use Pascal. But it is hard to find Pascal libraries for anything. Either they do not exist at all or have not been maintained for over a decade

So, if I want to use a data structure, I usually need to implement it myself

>Not necessarily. You'd have 2xlog(n) and because you wouldn't be storing the data in the HAMT it could be denser / wider and thus better fit in cache.

If I reuse the HAMT as persistent vector, it is not so good. There is no grouping, each node is allocated separately and could be anywhere in memory. One lookup could be five cache misses.

>At least that's what happened with Raymond Hettinger's (non-persistent) naturally ordered dict: rather than a sparse array of buckets, the hashmap is a sparse array of indices and a dense array of buckets leading to much better memory density (even more so with an adaptive sparse array where the stored indices can be u8, u16 or usize depending on the number of items: most maps will only need a sparse array of u8 leading to even better memory behaviour for the most problematic part).

I use such a hashmap, too, for the insertion order. (there are Pascal libraries with hashmaps, but even then I needed some weeks to customize it). But all indices there are 32-bit. Perhaps I will add the adaptiveness to it. 16-bit at least. 8-bit seems hardly worth it, at most it can save 256 bytes

>> But enumerating the list is in reverse, is it not? Going from the newest key to the oldest key. I want it to go from the oldest to the newest key.

>Not if you have a doubly linked list, but that increases your costs in storage and update complexity

But a doubly linked list would not be persistent.

Arbitrary many new nodes could point to one old node, but the old node can only point to one of them

masklinn · on Sept 7, 2020

HAMT are efficient for persistent data structures, compared to more usual trees.

They’re slower and less efficient than a mutable hashmap, but way more so if you need to keep the “old revisions” around.

jsdavo · on Sept 7, 2020

> they are likely not faster

If they are cache friendly then they can be very fast. The papers by Phil Bagwell (may he rest in peace) are very rich. Lots of implementation detail. Check them out!