Do you have perhaps some links/references on that? I have once tried benchmarkin...

Do you have perhaps some links/references on that?

I have once tried benchmarking it by writing a tiny VM interpreter and a corresponding threaded one with direct jumps in Zig (which can force inline a call, so I could do efficient direct jumps) and I have - to me surprisingly- found that the naive while-switch loop was faster, even though the resulting assembly of the second approach seemed right.

I wasn’t sure if I saw it only due to my tiny language and dumb example program, or if it’s something deeper. E.g. the JVM does use direct threaded code for their interpreter.