Tail call elimination is not an optimization because it changes the semantics of the program. The feature can take a program which would previously fail to terminate due to a stack overflow and cause it to terminate without error.
Perhaps TCO is better thought of as a language extension.
That seems no different than any other optimization: very directly tons of optimizations would reduce stack usage which would then change a given input from a stack overflow to a successful execution. Similarly anything that reduces heap memory usage or code size would also do the same.
That's irrelevant - your assertion was that a change that changes semantics by preventing stack overflow cannot be called an optimization. The commenter showed that is false, and gave reasons why.
Whether tail call goes from O(n) to O(1) doesn't change any of the above.
Perhaps TCO is better thought of as a language extension.