Every compiler should recognize and optimize for tail recursion. It's not any harder than most other optimizations, and some algorithms are far better expressed recursively.
In general, tail recursion destroys stacktrace information, e.g. if f calls g which tail calls h, and h crashes, you won't see g in the stacktrace, and this is bad for debuggability.
In lower level languages there are also a bunch of other issues:
- RAII can easily make functions that appear in a tail position not actually tail calls, due to destructors implicitly running after the call;
- there can be issues when reusing the stack frame of the caller, especially with caller-cleanup calling conventions;
- the compiler needs to prove that no pointers to the stack frame of the function being optimized have escaped, otherwise it would be reusing the memory of live variables which is illegal.
I'll believe destroying stacktrace information is a valid complaint when people start complaining that for loops destroy the entire history of previous values the loop variables have had. Tail recursion is equivalent to looping. People should stop complaining when it gives them the same information as looping.
> I'll believe destroying stacktrace information is a valid complaint when people start complaining that for loops destroy the entire history of previous values the loop variables have had.
That is a common criticism. You're referring to the functional programmers. They would typically argue that building up state based on transient loop variables is a mistake. The body of a loop ideally should be (at the time any stack trace gets thrown) a pure function of constant values and a range that is being iterated over while being preserved. That makes debugging easier.
I mean, if I were doing an ordinary non-recursive function call that just happened to be in tail position, and it got eliminated, and this caused me to not be able to get the full stack trace while debugging, I might be annoyed.
In a couple languages I've seen proposals to solve this problem with a syntactic opt-in for tail call elimination, though I'm not sure whether any mainstream language has actually implemented this.
Language designers could keep taking ideas from Haskell, and allow functions to opt in to appearing in stack traces. Give the programmer control, and all.
AFAIK Zig is the only somewhat-big and known low-level language with TCO. Obviously, Haskell/Ocaml and the like support and it are decently fast too, but system programming languages they are not.
As an optimization my understanding is that GCC and LLVM implement it so Rust, C, and C++ also have it implicitly as optimizations that may or may not apply to your code.
But yes, zig does have a formal language syntax for guaranteeing tail calls to happen at the language level (which I agree with as the right way to expose this optimization).
Zig's tco support is not much different than Clang's `[[clang::musttail]]` in C++. Both have the big restriction that the two functions involved are required to have the same signature.
> Both have the big restriction that the two functions involved are required to have the same signature.
I did not know that! But I am a bit confused, since I don't really program in either language. Where exactly in the documentation could I read more about this? Or see more examples?
The language reference for @call[0] was quite unhelpful for my untrained eye.
Generally I also find Zig's documentation pretty lacking, instead I try looking for the relevant issues/prs. In this case I found comments on this issues [1] which seem to still hold true. That same issue also links to the relevant LLVM/Clang issue [2], and the same restriction is also being proposed for Rust [3]. This is were I first learned about it and prompted me to investigate whether Zig also suffers from the same issue.
This limitation is to ensure that the two functions use the exact same calling convention (input & output registers, and values passed via stack). It can depend on the particular architecture.
I couldn't actually figure out whether this TCO being done "fairly well" was a guarantee or simply like Rust (I am referring to the native support of the language, not what crates allow)
I know of these. Almost added a disclaimer too -- that was not my point, as I am sure, you understand. Also Ocaml has a GC, unsuitable for many applications common to systems programming.
Some of the issues partially alleviated by using limited part of tail recursion optimization. You mark some function with tailrec keyword, and compiler verifies that this function calls itself as the last statement. You also wouldn't expect complete stack trace from that function. At the same time it probably helps with 90% of recursive algorithms which would benefit from the tail recursion.
My bigger issue with tail call optimization is that you really want it to be enforceable since if you accidentally deoptimize it for some reason then you can end up blowing up your stack at runtime. Usually failure to optimize some pattern doesn’t have such a drastic effect - normally code just runs more slowly. So tail call is one of those special optimizations you want a language annotation for so that if it fails you get a compiler error (and similarly you may want it applied even in debug builds).
Parroting something i have heard at a Java conference several years ago, tail recursion remove stack frames but the security model is based on stack frames, so it has to be a JVM optimization, not a compiler optimization.
I've no idea if this fact still holds when the security manager will be removed.
The security manager was removed (well, “permanently disabled”) in Java 24. As you note, the permissions available at any given point can depend on the permissions of the code on the stack, and TCO affects this. Removal of the SM thus removes one impediment to TCO.
However, there are other things still in the platform for which stack frames are significant. These are referred to as “caller sensitive” methods. An example is Class.forName(). This looks up the given name in the classloader of the class that contains the calling code. If the stack frames were shifted around by TCO, this might cause Class.forName() to use the wrong classloader.
No doubt there are ways to overcome this — the JVM does inlining after all — but there’s work to be done and problems to be solved.
There are similarities in the problems, but there are also fundamental differences. With inlining, the JVM can always decide to deoptimize and back out the inlining without affecting the correctness of the result. But it can't do that with tail calls without exposting the program to a risk of StackOverflowError.
We've been using TCO here ("tail call optimization") but I recall Guy Steele advocating for calling this feature TCE ("elimination") because programs can rely on TCE for correctness.
In theory, if all you do is implement algorithms, this sounds fine. But most apps implement horrible business processes, so what would one do with missing stacktraces? Maybe in languages that can mark functions as pure.
Why is this not done?