(As the article claims) even with computed goto register assignment of the most ...

pizlonator · 2024-08-19T17:09:51 1724087391

It's also fragile, in a different way, if you're threading state through tail calls.

In my experience writing computed goto interpreters, this isn't a problem unless you have more state than what can be stored in registers. But then you'd also have that same problem with tail calls.

haberman · 2024-08-19T18:19:39 1724091579

Fallback paths most definitely have more state than what can be stored in registers. Fallback paths will do things like allocate memory, initialize new objects, perform complicated fallback logic, etc. These fallback paths will inevitably spill the core interpreter state.

The goal is for fast paths to avoid spilling core interpreter state. But the compiler empirically has a difficult time doing this when the CFG is too connected. If you give the compiler an op at a time, each in its own function, it generally does much better.

pizlonator · 2024-08-19T18:30:45 1724092245

I get that and that’s also been my experience, just not for interpreters.

In interpreters, my experience is that fallback paths are well behaved if you just make them noinline and then ensure that the amount of interpreter state is small enough to fit in callee save regs.

haberman · 2024-08-19T18:48:08 1724093288

Mike Pall makes an argument that interpreters are especially susceptible to this problem, and I find it convincing, since it matches my experience: https://web.archive.org/web/20180331141631/http://article.gm...

pizlonator · 2024-08-19T18:58:56 1724093936

There are a bunch of arguments in there that don't match my experience, which includes the JSC interpreter. JSC had an interpreter written in C++ and one written in assembly, and the main reason for using the assembly one was not raw perf - it was so the interpreter knows the JIT's ABI for fast JIT<->interpreter calls.

Mike's argument about control flow diamonds being bad for optimization is especially questionable. It's only bad if one branch of the diamond uses a lot more registers than the other, which as I said, can be fixed by using noinline.

ufo · 2024-08-19T19:19:08 1724095148

Exactly. Computed goto helps with branch prediction, but does not help w.r.t register allocation & other compiler optimizations.

pizlonator · 2024-08-19T21:14:17 1724102057

As I mentioned in another part of the thread - the way you get that under control in a computed goto interpreter (or a switch loop interpreter) is careful use of noinline.

Also, it probably depends a lot on what you’re interpreting. I’ve written, and been tasked with maintaining, computed goto interpreters for quite a few systems and the top problem was always the branches and never the register pressure. My guess is it’s because all of those systems had good noinline discipline, but it could also just be how things fell out for other reasons.