> For example, GCC will happily remove the dest == NULL branch in the following ...

AceJohnny2 · 2024-12-11T21:17:22 1733951842

> -fdelete-null-pointer-checks

> [...]

> This option is enabled by default on most targets.

What a footgun.

I understand that, in an effort to compete with other compilers for relevance, GCC pursued performance over safety. Has that era passed? Could GCC choose safer over fast?

Alternatively, has someone compiled a list of flags one might want to enable in latest GCC to avoid such kinds of dangerous optimizations?

comex · 2024-12-11T22:48:26 1733957306

Just for the record, that's not the main purpose of -fdelete-null-pointer-checks.

Normally, it only deletes null checks after actual null pointer dereferences. In principle this can't change observable behavior. Null dereferences are guaranteed to trap, so if you don't trap, it means the pointer wasn't null. In other words, unlike most C compiler optimizations, -fdelete-null-pointer-checks should be safe even if you do commit undefined behavior.

This once caused a kerfuffle with the Linux kernel. At the time, x86_64 CPUs allowed the kernel to dereference userspace addresses, and the kernel allowed userspace to map address 0. Therefore, it was possible for userspace to arrange for null pointers to not trap when dereferenced in the kernel. Which meant that the null check optimization could actually change observable behavior. Which introduced a security vulnerability. [1]

Since then, Linux has been compiled with `-fno-delete-null-pointer-checks`, but it's not really necessary: Linux systems have long since enforced that userspace can't map address 0, which means that deleting null pointer checks should be safe in both kernel and userspace. (Newer CPU security features also protect the kernel even if userspace is allowed to map address 0.)

But anyway, I didn't know that -fdelete-null-pointer-checks treated "memcpy with potentially-zero size" as a condition to remove subsequent null pointer checks. That means that the optimization actually isn't safe! Once GCC is updated to respect the newly well-defined behavior, though, it should become truly safe. Probably.

The same can't be said for most UB optimizations – most of which can't be turned off.

[1] https://lwn.net/Articles/342330/

robinsonb5 · 2024-12-11T23:06:41 1733958401

> Null dereferences are guaranteed to trap, so if you don't trap, it means the pointer wasn't null.

dfe · 2024-12-12T06:18:46 1733984326

I once spent hours if not days debugging a problem with some code I had recently written because of this exact optimization.

It wasn't an embedded system, but rather an x86 BIOS boot loader, which is sort of halfway there. Protected mode enabled without paging, so there's nothing to trap a NULL.

Completely by accident I had dereferenced a pointer before doing a NULL check. I think the dereference was just printing some integer, which of course had a perfectly sane-looking value so I didn't even think about it.

The compiler, I can't remember if it was gcc or clang by this point, decided that since I had already successfully dereferenced the pointer it could just elide the null check and the code path associated with it.

Finally I ran it in VMware and attached a debugger, which skipped right over the null check even though I could see in the debugger the value was null. So then I went to look at the assembly the compiler generated, and that's when I started to understand what had happened.

It was a head-slapper when I found the dereference above. I added a second null check or moved that code or some such, and that was it.

pjmlp · 2024-12-12T07:21:06 1733988066

Now map the hours and days spent into actual money, being taken from project budget, and then you realise why some business prefer some languages over others.

rcxdude · 2024-12-12T20:19:50 1734034790

There was a more egregoius one which got Linus further pissed off with GCC, which was due to a 'dereference' that would not trap, but still deleted a later null check (because e.g. int *foo = &bar->baz is basically just calculating an offset to bar, and so will not fail at runtime, but it is still a dereference according to the abstract machine and so is undefined if bar is NULL). I think the risk of something like that is why it's still disabled.

ryao · 2024-12-11T21:34:24 1733952864

Usually, when one marks an argument as nonnull via a function attribute, one wants NULL checks to be removed.

ndesaulniers · 2024-12-11T21:54:27 1733954067

There's two similar but distinct function attributes for nullability. One affects codegen, one affects diagnostics only.

ryao · 2024-12-11T23:09:02 1733958542

Which are those? I only know about nonnull, nonnull_if_nonzero and returns_nonnull:

https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attribute...

AceJohnny2 · 2024-12-11T22:44:05 1733957045

Irrelevant, because delete-null-pointer-checks happens even in absence of nonnull function attribute, see GP's godbolt link, and the documentation that omits any reference to that function attribute.

That's what makes it dangerous!

ryao · 2024-12-11T22:56:54 1733957814

That is a side effect of passing the pointer as a function parameter marked nonnull. It implies that the pointer is nonnull and any NULL checks against it can be removed. Pass it to a normal function and you will not see the NULL check removed.