A six-fold improvement in the Merkle tree storage for Tezos

mleonhard · on Feb 24, 2022

> These indexes often are just a sequence of ordered entries, but new entries are added in random order. This means that from time to time, you need to sort the index.

This seems like a crude index. Why didn't they start out using B-epsilon (Bε) trees [0]?

> The indexes we use in the TezEdge v1.15 only store references to the commits special objects in the storage ... They are small enough to be completely loaded and sorted in-memory ...

If they can store their index in RAM then why did they write a custom storage engine? Why don't they just use Postgres?

[0] https://news.ycombinator.com/item?id=29403320

infogulch · on Feb 24, 2022

The y-axis on these graphs are suspect, if not a smoking gun. E.g. memory usage "before" ranged from 0-6GB, and "after" ranged from 0-20GB (!); eyeballing it, "after" might still be lower, but the y-axis shenanigans doesn't give me much confidence.

tizoc · on Feb 24, 2022

From the article:

> Please note that the large spike in the TezEdge v1.15 RAM graph is caused by the update to protocol 011 Hangzhou. This included a major restructuring of the context tree, which is a very expensive operation. While the new representation of the context tree is better, it takes a while to migrate the past version’s tree to the new one.

The difference in range is caused by that spike (the "before" screenshot is from a node that has not reached that point).

Edit: you can also see the actual usage in the tooltip (and before the spike and protocol switch, the memory usage for the TezEdge one was even lower)

nybble41 · on Feb 24, 2022

If you look at the current numbers in the upper-right corner of the graph, validator RAM use went from 3816 MB to 2956 MB, a reduction of ~860 MB. The y-axis was selected by the tool based on the data in each series, and the second graph happens to include a spike above 10 GB caused by protocol migration; I doubt the author was being deliberately manipulative. (I do agree that it would have been better to present the data without that spike for a closer comparison.)

vishakh82 · on Feb 24, 2022

Fantastic work. A great piece of engineering!

ForHackernews · on Feb 24, 2022

Huh, maybe the pointless speculative waste of cryptocurrency will produce actual engineering improvements as by-products. At least Merkle trees have real applications, unlike ASIC bitcoin miners.

baby · on Feb 24, 2022

Wow. What a useful comment.