One thing I've been thinking about in C++ land is that just how much the idiomat...

ufo · on April 25, 2021

You can still use this trick in C++ if you ensure that the return statements are outside the scopes that have the RAII objects. It's awkward but it's better than nothing. https://godbolt.org/z/cnbjaK85T

    void foo(int x) {
      {
        MyObj obj; 
        // ...
      }
      return bar(x); // tail call
    }

gumby · on April 25, 2021

I am unable to understand this comment. You're saying that you can't generate a tail call by returning from the middle of a function which needs to clean things up. RAII* is merely syntactic sugar to write this control flow and make it mandatory.

Perhaps it's easier to think of tail-call semantics as simply implementing iteration: it's another way of expressing a for(...) loop. And if you used RAII in the body of your for loop you would expect the same thing.

hvdijk · on April 25, 2021

I can understand the comment: even if the cleanup can be done before the call, allowing the call to be done as a tail call, the compiler will not move up the cleanup code for you, and having the cleanup code last is the default state. With local variables defined inside for loops, they are always destroyed before the next iteration.

neoteric · on April 25, 2021

Absolutely, RAII is an abstraction (and a useful one), but it has a cost in that it prevents a form of useful optimization because cleanup is required at the destruction of the stack frame. You'd expect the same in C if you explicitly had to call a cleanup function on return from a call.

What C++ does with RAII is make this tradeoff not obvious. std::unique_ptr is a great example to show this: colloquially a std::unique_ptr is "just a pointer", but it isn't in this case because it's non-trivial destructor prevents TCO.

skybrian · on April 25, 2021

These tail-call functions are part of a program’s inner loop. It seems like we shouldn’t expect allocation within an inner loop to be fast? Other than local variables, that is.

In an interpreter, it seems like either you’re allocating outside the interpreter loop (the lifetime is that of the interpreter) or it’s within a particular (slow) instruction, or it’s part of the language being interpreted and the lifetime can’t be handled by RAII. There will be fast instructions that don’t allocate and slower ones that do.

Interpreters are a bit weird in that there are lots of instructions that do almost nothing so the overhead of the loop is significant, combined with having lots of complication inside the loop and little ability to predict what instruction comes next. This is unlike many loops that can be unrolled.

sesuximo · on April 25, 2021

I suppose the compiler could reorder function calls if it can prove there is no change in behavior? If so, then it could hoist dtors above the call and emit a jump. I doubt any compilers do this.

astrange · on April 26, 2021

I would hope musttail does this if it “must” be a tail.

Actually, I need it to do this for something at my day job, guess I’ll look it up…