No, recursion still works well without TCO (though as a Schemer, I love TCO). I ...

No, recursion still works well without TCO (though as a Schemer, I love TCO). I was programming in BCPL in the early 1970s, and it handled recursive procedures with aplomb. The big revolution was realizing that, if you don't allow access to automatic variables declared in outer scopes, you could store all the variables in the stack frame, and access them with a small offset from the stack or frame pointer. That made automatic variables just about as fast as static ones (which, on System/360, had to be accessed via a base register), with small overheads at call and return sites.

Again on System/360, I benchmarked BCPL procedure call costs against subroutine call costs in Fortran G (the non-optimizing compiler). BCPL was about 3 times faster.

That said, as soon as you added multi-tasking (what we'd now call threads), it all went to hell. It's not an accident that one IBM PL/I manual of the 1960s said “Do not use procedures, they are expensive.”

As mentioned by others, it was the tiny stack in the 6502 that killed this approach. I appreciate all those who pine for the 6502, but it made implementing modern (even for the 1970s) languages almost impossible.