Fwiw I don't think there's anything inherently sequential about the Keccak permutation itself. KangarooTwelve is a fully parallelizable hash built on Keccak. (Though they did use the sponge construction on the XOF side, so that part is serial.)
I meant the absorb and squeeze part. The permutation itself (or more specifically its round) could be efficiently implemented in hardware, but you can't mask latency by parallel application of the permutation. Yes, KangarooTwelve is an improvement in this regard, but the grandparent was talking specifically about Keccak/SHA-3.
Sorry for lack of clarity, but i was saying “Keccak” and not “sha3” for that specific reason: it’s a permutation building block suitable for a whole range of constructions - cshake, kangaroo etc. sha3 specifically is an overkill and unnecessary imho.
CShake128 is much better replacement for hmac and sha512 in (zk)proofs, while Kangaroo for things like FDE and massive volumes of data.