I think older processors used to have a slower implementation for shifts, which made this slower.
Nowadays swisstable and other similar hashtables use the top bits and simd/swar techniques to quickly filter out collisions after determining the starting bucket.
Nowadays swisstable and other similar hashtables use the top bits and simd/swar techniques to quickly filter out collisions after determining the starting bucket.