GCC has long been known to define undefined behavior in C unions. In particular, type punning in unions is undefined behavior under the C and C++ standards, but GCC (and Clang) define it.
In C89, it was implementation-defined. In C99, it was made expressly legal, but it was erroneously included in the list of undefined behavior annex. From C11 on, the annex was fixed.
> but UB in C++
C++11 adopted "unrestricted unions", which added a concept of active members that is UB to access other members unless you make them active. Except active members rely on constructors and destructors, which primitive types don't have, so the standard isn't particularly clear on what happens here. The current consensus is that it's UB.
C++20 added std::bit_cast which is a much safer interface to type punning than unions.
> punning through incompatible pointer casting was UB in both
There is a general rule that accessing an object through an 'incompatible' lvalue is illegal in both languages. In general, changing the const or volatile qualifier on the object is legal, as is reading via a different signed or unsigned variant, and char pointers can read anything.
> In C99, it was made expressly legal, but it was erroneously included in the list of undefined behavior annex.
In C99, union type punning was put under Annex J.1, which is unspecified behavior, not undefined behavior. Unspecified behavior is basically implementation-defined behavior, except that the implementor is not required to document the behavior.
You can, but in the context of the standard, you'd be wrong to do so. Undefined behavior and unspecified behavior have specific, different, meanings in context of the C and C++ standards.
> You can, but in the context of the standard, you'd be wrong to do so. Undefined behavior and unspecified behavior have specific, different, meanings in context of the C and C++ standards.
> Conflate them at your own peril.
I think that ryao was not conflating them, but literally just pointing out, as a joke, that "UB" can stand for "undefined behavior" or "unspecified behavior." Taking advantage of this is inviting dangerous ambiguity, which is why ryao's suggestion ended with ":)," but I think that saying that it's wrong is an overstateent.
There has been plenty of misinformation spread on that. One of the GCC developers told me explicitly that type punning through a union was UB in C, but defined by GCC when I asked (after I had a bug report closed due to UB). I could find the bug report if I look for it, but I would rather not do the search.
From a draft of the C23 standard, this is what it has to say about union type punning:
> If the member used to read the contents of a union object is not the same as the member last used to store a value in the object the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called type punning). This might be a non-value representation.
In past standards, it said "trap representation" rather than "non-value representation," but in none of them did it say that union type punning was undefined behavior. If you have a PDF of any standard or draft standard, just doing a search for "type punning" should direct you to this footnote quickly.
So I'm going to say that if the GCC developer explicitly said that union type punning was undefined behavior in C, then they were wrong, because that's not what the C standard says.
> (11) The values of bytes that correspond to union members other than the one last stored into (6.2.6.1).
So it's a little more constrained in the ramifications, but the outcomes may still be surprising. It's a bit unfortunate that "UB" aliases to both "Undefined behavior" and "Unspecified behavior" given they have subtly different definitions.
From section 4 we have:
> A program that is correct in all other aspects, operating on correct data, containing unspecified behavior shall be a correct program and act in accordance with 5.1.2.4.
I actually might, although not now. Thanks for the link. I'm surprised he directly contradicted the C standard, rather than it just being a misunderstanding.
It doesn't. That commenter is saying that in C99, it was unspecified behavior. Since C11 onward, it's been removed from the unspecified behavior annex and type punning is allowed, though it may generate a trap/non-value representation. It was never undefined behavior, which is different.
Edit: no, it's still in the unspecified behavior annex, that's my mistake. It's still not undefined, though.
I am a member of the standards committee and a GCC maintainer. The C standard supports union punning. (You are right though that relying on godbolt examples can be misleading.)
> type punning in unions is undefined behavior under the C and C++ standards
Union type punning is entirely valid in C, but UB in C++ (one of the surprisingly many subtle but still fundamental differences between C and C++). There's specifically a (somewhat obscure) footnote about this in the C standard, which also has been more clarified in one of the recent C standards.
There is no footnote about it in the C standard. Someone proposed adding one to standardize the behavior, but it was never accepted. Ever since then, people keep quoting it even though it is a rejected amendment.
> If the member used to read the contents of a union object is not the same as the member last used to store a value in the object the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called type punning). This might be a non-value representation.
(though this footnote has been present as far back as C99, albeit with different numbers as the standard has added more text in the intervening 24 years).
It is an excerpt being taken out of context. Of course it is quite clear. Taking it out of context ignores everything else that the standard says. That interpretation is wrong as far as compiler authors are concerned.
The context is that it's a footnote. The footnote is referenced in this paragraph:
A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member (106), and is an lvalue if the first expression is an lvalue. If the first expression has qualified type, the result has the so-qualified version of the type
of the designated member.
106) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called type punning). This might be a non-value representation.
In that same document, union type punning is explicitly listed under Annex J.1, Unspecified Behavior:
(11) The values of bytes that correspond to union members other than the one last stored into (6.2.6.1).
The standard is extremely clear and explicit that it's not undefined behavior.
I am a member of the C standards committee, and I'm telling you you're wrong here. Martin Uecker is also member of the C standards committee, and has just responded to that bug saying that the comment you linked is wrong. I, and others here, have quoted literal standards text to you explaining why type punning through unions is well-defined behavior in C.
I don't know who Andrew Pinski is, but they're factually incorrect regarding the legality of type punning via unions in C.
Andrew is a GCC developer who is very competent (much more than myself regarding GCC), but I think he was mistakenly assuming the C++ rules apply to C here as well.
EDIT: This comment is wrong, see fsmv’s comment below. Leaving for posterity because I’m no coward!
- - -
Undefined behavior only means that the spec leaves a particular situation undefined and that the compiler implementor can do whatever they want. Every compiler defines undefined behavior, whether it’s documented (or easy to qualify, or deterministic) or not.
It is in poor taste that gcc has had widely used, documented behaviors that are changing, especially in a point release.
I think you're confusing unspecified and undefined behavior. UB could do something randomly different every time and unspecified must chose an option.
In a lot of cases in optimizing compilers they just assume UB doesn't exist. Yes technically the compiler does do something but there's still a big difference between the two.
When you have a big system many people rely on you generally try to look for ways to keep their code working - not look for the changes you’re contractually allowed to make.
GCC probably has a better justification than “we are allowed to”.
Undefined in the standard doesn't mean undefined in GCC. Type-punning through unions has always been a special case that GCC has taken care with beyond the standard.
The code was already broken. It was an undefined behavior.
That's a problem with C and it's undefined behavior minefields.