Can someone explain the reasoning for using a HAMT here? I have played around with hashtables quite a bit, but altough HAMT's are cool and a lot more memory efficient, they are likely not faster and will at some point suffer from more severe memory fragmentation than hashtables.
Is the concern speed or memory? Because speed would kind of seem strange, since after changing to the new architecture, they should see a strong decrease in messages being exchanged anyways, right?
It is not clear from the article that the author needed a persistent data structure.
It’s also, frankly, not clear they‘d benefit from using an HAMT at all: the only measurements they present are of their problem domain, not of any implementations—and many of the asymptotics they cite are also true of hashmaps.
If their initial hashmap approach did have O(N) inserts because they copied the whole map for “immutability” every time, then sure, HAMTs will beat the pants off of it—but did it?
Sure. Map the HAMT to an index into a persistent vector or something e.g. Scala's VectorMap and TreeSeqMap. You could also reuse the old "linked list map" idea, but go through the keys every time e.g. whenever you add a value to the map, add a triple of `(Some(last_key), value, None)` and update the last keys value such that the third item becomes the key of the new item.
>Sure. Map the HAMT to an index into a persistent vector or something e.g. Scala's VectorMap and TreeSeqMap
I do not have a persistent vector, only a HAMT.
Perhaps the HAMT becomes a standard trie if one sets hashcode(i) = i, and then it can be enumerated in ascending key order?
But the double-lookup in two data structures is unnecessary slow. The first HAMT could store (Index, Value) pairs and only use the "vector" for enumeration.
But I was hoping for a single data structure that can do lookups and ordered enumerations.
>. You could also reuse the old "linked list map" idea, but go through the keys every time e.g. whenever you add a value to the map, add a triple of `(Some(last_key), value, None)` and update the last keys value such that the third item becomes the key of the new item.
I have been using a linked list as sole storage so far. The lookup was O(n), which was fast enough with my data, but I wanted to improve it to be more scalable.
But enumerating the list is in reverse, is it not? Going from the newest key to the oldest key. I want it to go from the oldest to the newest key.
In any case it should be better to store pointers to HAMT nodes rather than keys.
> But the double-lookup in two data structures is unnecessary slow.
Not necessarily. You'd have 2xlog(n) and because you wouldn't be storing the data in the HAMT it could be denser / wider and thus better fit in cache. At least that's what happened with Raymond Hettinger's (non-persistent) naturally ordered dict: rather than a sparse array of buckets, the hashmap is a sparse array of indices and a dense array of buckets leading to much better memory density (even more so with an adaptive sparse array where the stored indices can be u8, u16 or usize depending on the number of items: most maps will only need a sparse array of u8 leading to even better memory behaviour for the most problematic part).
> But enumerating the list is in reverse, is it not? Going from the newest key to the oldest key. I want it to go from the oldest to the newest key.
Not if you have a doubly linked list, but that increases your costs in storage and update complexity
> In any case it should be better to store pointers to HAMT nodes rather than keys.
If you're implementing the entire datastructure then yes, I was rather assuming you were on the outside with an existing HAMT.
I have not read about them. Might take a while to implement.
I use Pascal. But it is hard to find Pascal libraries for anything. Either they do not exist at all or have not been maintained for over a decade
So, if I want to use a data structure, I usually need to implement it myself
>Not necessarily. You'd have 2xlog(n) and because you wouldn't be storing the data in the HAMT it could be denser / wider and thus better fit in cache.
If I reuse the HAMT as persistent vector, it is not so good. There is no grouping, each node is allocated separately and could be anywhere in memory. One lookup could be five cache misses.
>At least that's what happened with Raymond Hettinger's (non-persistent) naturally ordered dict: rather than a sparse array of buckets, the hashmap is a sparse array of indices and a dense array of buckets leading to much better memory density (even more so with an adaptive sparse array where the stored indices can be u8, u16 or usize depending on the number of items: most maps will only need a sparse array of u8 leading to even better memory behaviour for the most problematic part).
I use such a hashmap, too, for the insertion order. (there are Pascal libraries with hashmaps, but even then I needed some weeks to customize it). But all indices there are 32-bit. Perhaps I will add the adaptiveness to it. 16-bit at least. 8-bit seems hardly worth it, at most it can save 256 bytes
>> But enumerating the list is in reverse, is it not? Going from the newest key to the oldest key. I want it to go from the oldest to the newest key.
>Not if you have a doubly linked list, but that increases your costs in storage and update complexity
But a doubly linked list would not be persistent.
Arbitrary many new nodes could point to one old node, but the old node can only point to one of them
If they are cache friendly then they can be very fast. The papers by Phil Bagwell (may he rest in peace) are very rich. Lots of implementation detail. Check them out!
Is the concern speed or memory? Because speed would kind of seem strange, since after changing to the new architecture, they should see a strong decrease in messages being exchanged anyways, right?
Or are go's maps inefficient? Genuinely curious