(As the article claims) even with computed goto register assignment of the most frequently used variables is fragile because the CFG of the function is so complicated.
Register assignment is much less fragile when each function is small and takes the most important variables by argument.
It's also fragile, in a different way, if you're threading state through tail calls.
In my experience writing computed goto interpreters, this isn't a problem unless you have more state than what can be stored in registers. But then you'd also have that same problem with tail calls.
Fallback paths most definitely have more state than what can be stored in registers. Fallback paths will do things like allocate memory, initialize new objects, perform complicated fallback logic, etc. These fallback paths will inevitably spill the core interpreter state.
The goal is for fast paths to avoid spilling core interpreter state. But the compiler empirically has a difficult time doing this when the CFG is too connected. If you give the compiler an op at a time, each in its own function, it generally does much better.
I get that and that’s also been my experience, just not for interpreters.
In interpreters, my experience is that fallback paths are well behaved if you just make them noinline and then ensure that the amount of interpreter state is small enough to fit in callee save regs.
There are a bunch of arguments in there that don't match my experience, which includes the JSC interpreter. JSC had an interpreter written in C++ and one written in assembly, and the main reason for using the assembly one was not raw perf - it was so the interpreter knows the JIT's ABI for fast JIT<->interpreter calls.
Mike's argument about control flow diamonds being bad for optimization is especially questionable. It's only bad if one branch of the diamond uses a lot more registers than the other, which as I said, can be fixed by using noinline.
As I mentioned in another part of the thread - the way you get that under control in a computed goto interpreter (or a switch loop interpreter) is careful use of noinline.
Also, it probably depends a lot on what you’re interpreting. I’ve written, and been tasked with maintaining, computed goto interpreters for quite a few systems and the top problem was always the branches and never the register pressure. My guess is it’s because all of those systems had good noinline discipline, but it could also just be how things fell out for other reasons.
Register assignment is much less fragile when each function is small and takes the most important variables by argument.