One thing I've been thinking about in C++ land is that just how much the idiomatic usage of RAII actually prevents the compiler from doing it's own tail call optimization. Any object instantiated in automatic storage with a non-trivial destructor is basically guaranteeing the compiler _can't_ emit a tailcall. It's rather unfortunate, but perhaps worth it if the traideoff is well understood. Example: https://godbolt.org/z/9WcYnc8YT
You can still use this trick in C++ if you ensure that the return statements are outside the scopes that have the RAII objects. It's awkward but it's better than nothing. https://godbolt.org/z/cnbjaK85T
I am unable to understand this comment. You're saying that you can't generate a tail call by returning from the middle of a function which needs to clean things up. RAII* is merely syntactic sugar to write this control flow and make it mandatory.
Perhaps it's easier to think of tail-call semantics as simply implementing iteration: it's another way of expressing a for(...) loop. And if you used RAII in the body of your for loop you would expect the same thing.
I can understand the comment: even if the cleanup can be done before the call, allowing the call to be done as a tail call, the compiler will not move up the cleanup code for you, and having the cleanup code last is the default state. With local variables defined inside for loops, they are always destroyed before the next iteration.
Absolutely, RAII is an abstraction (and a useful one), but it has a cost in that it prevents a form of useful optimization because cleanup is required at the destruction of the stack frame. You'd expect the same in C if you explicitly had to call a cleanup function on return from a call.
What C++ does with RAII is make this tradeoff not obvious. std::unique_ptr is a great example to show this: colloquially a std::unique_ptr is "just a pointer", but it isn't in this case because it's non-trivial destructor prevents TCO.
These tail-call functions are part of a program’s inner loop. It seems like we shouldn’t expect allocation within an inner loop to be fast? Other than local variables, that is.
In an interpreter, it seems like either you’re allocating outside the interpreter loop (the lifetime is that of the interpreter) or it’s within a particular (slow) instruction, or it’s part of the language being interpreted and the lifetime can’t be handled by RAII. There will be fast instructions that don’t allocate and slower ones that do.
Interpreters are a bit weird in that there are lots of instructions that do almost nothing so the overhead of the loop is significant, combined with having lots of complication inside the loop and little ability to predict what instruction comes next. This is unlike many loops that can be unrolled.
I suppose the compiler could reorder function calls if it can prove there is no change in behavior? If so, then it could hoist dtors above the call and emit a jump. I doubt any compilers do this.