bytehamster's comments

bytehamster · 2025-12-27T18:04:16 1766858656

If I understood their first demo correctly, they verified a fedora iso with a detached signature. The booted iso then printed "hello 39c3". https://streaming.media.ccc.de/39c3/relive/1854

unscaled · 2025-12-27T18:32:58 1766860378

It was a cleartext signature, not a detached signature.

Edit: even better. It was both. There is a signature type confusion attack going on here. I still didn't watch the entire thing, but it seems that unlike gpg, they do have to specify --cleartext explicitly for Sequoia, so there is no confusion going on that case.

bytehamster · 2025-06-11T17:47:43 1749664063

That sounds super interesting! Do you remember the title or the authors of the paper?

bytehamster · 2025-06-11T15:56:34 1749657394

In the appendix of the survey, there are 3 references on dynamic perfect hashing. I think the only actual implementation of a dynamic PHF is a variant of perfect hashing though fingerprinting in the paper "perfect hashing for network applications". However, that implementation is not fully dynamic and needs to be re-built if the key set changes too much.

wahern · 2025-06-11T23:11:35 1749683495

All of those modern algorithms, even relatively older ones like CHD, can find a perfect hash function over millions of keys in less than a second.[1] Periodically rebuilding a function can be more than fast enough, depending on your use case.

Last time I tried gperf 8-10 years, it took it hours or even days to build a hash function CHD could do in seconds or less. If someone's idea of perfect hash function construction cost is gperf (at least gperf circa 2015)... welcome to the future.

[1] See my implementation of CHD: https://25thandclement.com/~william/projects/phf.html

bytehamster · 2025-06-11T15:52:54 1749657174

Addition: BBHash, in turn, is a re-implementation of FiPHa (perfect hashing through fingerprinting). There are quite many re-implementations of FiPHa: BBHash, Boomph, FiPS, FMPH, etc. As shown in the survey, BBHash is by far the slowest. Even though it implements exactly the same algorithm, FMPH is much faster. Its paper [1] also compares to Boomph. The beauty of the fingerprinting technique is that it is super simple. That's probably the reason why there are so many implementations of it.

[1] https://dl.acm.org/doi/pdf/10.1145/3596453

bytehamster · 2025-06-11T15:41:10 1749656470

Many modern perfect hash functions are super close to the space lower bound, often having just between 3% and 50% overhead. So your claim that the space consumption "twice as low" is information theoretically impossible. With gperf, the space consumption is in the machine code instead of a data structure, but it's definitely not "for free". In fact, I'm pretty sure that the size of the machine code generated by gperf is very far from optimal. The only exception are tiny input sets with just a couple of hundred keys, where gperf probably wins due to lower constant overheads.

rurban · 2025-06-12T05:48:04 1749707284

My special use case is i.e. unicode property checks, for which gperf is not big enough, and which has millions of keys. Other cases are integer keys.

I'm certainly not willing to load they keys and mpfh properties at query-time from disc, as they are known in advance and can be compiled to C or C++ code in advance, which leads to an instant load-time, in opposition to your costly deserialization times in all your tools.

Your deserialization overhead space is not calculated, and also not the storage costs for the false positive check. It's rather academic, not practical

bytehamster · 2025-06-12T10:03:35 1749722615

There is no deserialization time or space overhead. The measurements refer to the deserialized form. They are not loaded from disk.

About false positive checks, I think you misunderstand what a perfect hash function does.

rurban · 2025-06-12T10:26:43 1749724003

See, everybody can see how you cheat your benchmarks. It's unrealistic to measure only the perfect hash function, when you discard the cost of the deserialization and false-positive checks.

That's always the case when dealing with you academics.

We dont care how you define your costs when we laugh about that. In reality you have to load the keys, the data structures and check for existance. You even discard the existance checks, by omitting the false-positive checks. Checking against a non-existing key will lead to a hit in your check. Only very rarely you know in advance if your key is in the set

rurban · 2025-06-12T14:39:37 1749739177

Also, you miss the space costs for the ordering.