> The most terrifying concept is still the good old undefined behavior. I think ...

rstuart4133 · on May 8, 2024

> I think people tend to parrot undefined behavior as if it's some kind of gotcha when in practice this only means two things:

No, not really. In fact an excellent counter example can be found on this very iceberg: https://blog.regehr.org/archives/161

C++ at the some optimisation levels says the example program disproved Fermat's Last Theorem, which should come as a surprise as their are no known counter examples. The program is also valid C. When compiled with good C compilers at all optimisation levels (both gnu and clang count as good), the program never exits because it doesn't find a counter example. But the gnu and clang c++ compilers do behave as described. The difference is caused by what the C and C++ language standards consider to be UB.

Why? C++ (but not C) defines an infinite loop as undefined behaviour, and at high enough optimisation levels the compiler spots the infinite loop (which is the correct behaviour if the theorem is true) and goes with the other option. (As an aside, it's impressive it does detect this particular infinite loop.)

Why is that a problem? Well lots of programs are deliberate infinite loops. In practice many don't qualify as UB because of various conditions listed in the article, but you would have to be a language lawyer to know them. The author gives real life examples of how he was caught by the surprise removal of his loop.

Deliberate infinite loops are actually fairly common and are perfect correct code. Proving they are infinite is famously undecidable, so you can usually get away with it safe in the knowledge the compiler won't spot it. But compilers keep chipping away at the edge cases so one day you can find your program changes its behaviour drastically just because it was compiled with a new version of the compiler, or even just different optimisation options.

chipdart · on May 9, 2024

> No, not really. In fact an excellent counter example can be found on this very iceberg: https://blog.regehr.org/archives/161

I'm not sure you read the blog post you cited, because it states exactly this:

> In C, overflowing a signed integer has undefined behavior, and a program that does this has no meaning at all. It is ill-formed.

I think this is quite clear. This is exactly what I stated in my first example. I'm not sure why you missed that. It's the whole point of undefined behavior, and what people to confuse what it means.

Then there's the nuance that those who mindlessly parrot undefined behavior as some kind of gotcha clearly fail to understand, which is the fact that implementations (meaning, compiler vendors) are given free reign to implement whatever implementation-specific behavior they see fit. Why? Because the standard left tha behavior undefined, which also means the standard does not prevent implementations from defining their own behavior?

Do you understand the whole purpose of this? The blog post you cited clearly shows examples of C programs what use data types that are system- and implementation-specific, and thus overflow is left undefined because, just like the integer types, its behavior can and does differ among systems. It would be absurd to specify overflows should wrap around/saturate values/stay MAX/throw exceptions/terminate program because systems implemebt this differently. The role of a programming language is to write programs that machines run, and thus the programs need to express wha each machine does, nit what some other machine does.

I recommend you re-read what I wrote in my post to see the whole point of undefined behavior and its critical importance. It is high time that this misconception is put to rest.

rstuart4133 · on May 13, 2024

> I recommend you re-read what I wrote in my post to see the whole point of undefined behavior and its critical importance. It is high time that this misconception is put to rest.

Uhmmm, did you read what I said? I ask because it looks like a red haze descend after descended after you read the first few words. I've been writing C for 40 years. I know why C made integer overflows UB. It seemed perfectly sensible to me. That's why I didn't mention it. I'm not sure why you brought it up. In the article it says this:

    Third, there are no integer overflow games going on here, as long as the code is compiled for a platform where an int is at least 32 bits. This is easy to see by inspecting the program. The termination problems are totally unrelated to integer overflow.

The thing I was discussing was C++ defining a infinite loop as UB. I'm not alone in thinking that was a bad idea. In the C standardisation group apparently also thinks that, because they are UB in C. You can check that for yourself: compile the Fermat program with gnu C and it behaves sanely. Compile it with gnu C++ is it gets the wrong answer. clang behaves identically. The author of the article also thinks it's amazing:

    In other words, from the point of view of the language semantics, a loop of this form is no better than an out-of-bounds array access or use-after-free of a heap cell. This somewhat amazing fact ...

So yes, I'm firmly in the camp the C++ committee lost the plot here.

I think the way C and C++ handles UB belongs to a bygone era. In their defence when I was writing C 40 years ago there was no choice. Checking if every integer operation overflowed imposed an unacceptable performance penality. While today in languages like Rust compilers statically check for UB at compile time, back then C compilers were barely more than overblown assemblers. UB simply meant "you get what the hardware gives you". Notice that means on a given arch there was no real undefined behaviour. I knew perfectly well what would happen on x86 if an integer overflowed, I know whether a char was signed and I wrote my programs accordingly, sometimes deliberately exploiting what I knew would happen. UB only bit you hard when you ported your program.

What happened is compiler writers seeking to squeeze every last bit of performance pushed "undefined" to mean "since it's undefined, I can do whatever I damned well please", and with C++ and infinite loops they've managed to push that beyond absurdity. Meanwhile other languages have gone in opposite direction. Rather than giving programmers more rope to hang themselves with they've got runtime and compile time checks that either outright band undefined behaviour, or warn you about it.