Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I am misunderstanding something about `restrict`.

> the original program already had Undefined Behavior, or (at least) one of the optimizations is wrong.

As far as I know, since the uwu function was declared with x and y as restrict, the way uwu is called in main is undefined behavior. Because they are both pointing into the same array, and are both 'derived' from the same array.

I guess if I am wrong its because `restrict` does not care if both are derived from the same pointer. Instead it might only care if x was derived directly from y or y was derived directly from x. Is there a good reason to only care about this 'directly derived from' rather than 'shared derivation'? It would sorta suck if you can't use a function with to restrict arguments pointing into the same array, but this example seems to suggest that might be a reasonable requirement.



`restrict` pointers have nothing to do with the underlying "object" they point into (an array in this case). `restrict` lets the compiler assume that reads through a restricted pointer or any pointer expressions derived from it are only affected by writes through that pointer or expressions derived from it. There are only two writes in this example: `*x = 0`, which is trivially correct; and `*ptr = 1`, where `ptr` is derived from `x` (via `ptr = (int *)(uintptr_t)x`) so this is also correct. However, it's now easy to see that the optimization replacing `xaddr` with `y2addr` causes undefined behavior since it changes `ptr` to be derived from `y`. The post addresses this in "The blame game" and mentions that integers could carry provenance but that it's infeasible to actually do this.

The weak provenance solution is to ban optimizing the pointer to integer casts since they have a (conceptual) side-effect. The strict provenance proposal points out that the side effect is only observable if the integer is cast back to a pointer, so we can keep optimizing pointer to integer casts and instead ban integer to pointer casts. For example, an operation like `(int *)xaddr` is banned under strict provenance. Instead, we provide a casting operation that includes the pointer to assume the provenance of; something like `(int * with x)xaddr`. With this new provenance-preserving cast, we can see that the final optimization of replacing `*x` with its previously assigned value of `0` is no longer possible because the code in between involves `x`.


> However, it's now easy to see that the optimization replacing `xaddr` with `y2addr` causes undefined behavior since it changes `ptr` to be derived from `y`.

Yeah this article is great and the framing is pretty perfect. It really shows that optimization passes can't remove information, else they run the risk of tricking later passes. I definitely agree with OP that "the incorrect optimization is the one that removed xaddr"; that optimization seems wild to me. You only know y is x + 1 because of the way it's constructed in the calling function (main). So the compiler... inlines an optimized version of the function that removes most use of x? Isn't that optimizer fundamentally broken? Especially in a language with `volatile` and `restrict`?


It's a static function, so the compiler knows main is the only caller. gcc -O optimizes the whole program down to printf("%d\n", 1).


Sure, but that requires compilation unit level analysis or inlining (when inlined you can include pointer provenance from main), otherwise you can't guarantee the relationship between x and y.

I guess what bugs me about optimizations is that it feels like something _I_ should be doing. Like if GCC told me this code optimizes down to printf 1 and why, I'd question what I was doing (and rightly so). Doing it automatically feels like too much spooky action at a distance.


In the case of the code we're talking about here, gcc/clang do rely on inlining to optimize down to the single printf. I don't think there's any actual compiler that does the dangerous and invalid optimization in the article.


OH! I've clearly misunderstood then. Rereading, it does look like this is just a hypothetical to illustrate the tension between allowing pointer-int-pointer round-trips and foiling analysis based on pointer provenance. OK I'm caught up, thank you haha.


Indeed I don't think C cares about how the pointers you mark as "restrict" were constructed, it just cares about how you actually use the pointers and in the example code they are never used in an "overlapping" way.

I just checked cppreference and it says: "if some object that is accessible through P [marked restrict] (directly or indirectly) is modified, by any means, then all accesses to that object (both reads and writes) in that block must occur through P (directly or indirectly), otherwise the behavior is undefined"

The only accesses I can see are "*x = 0;", "*ptr = 1;" and "return *x;". "*ptr" is an indirect access through "x" by way of an integer round trip. If the orignal program is indeed UB that would mean that pointers built from an integer round trip are not considered (indirect) accesses through the original pointer.

The author assumes that it is an indirect access: "However, the only possibly suspicious part of the original program is a pointer-integer-pointer round-trip – and if casting integers to pointers is allowed, surely that must work. I will, for the rest of this post, assume that replacing x by (int*)(uintptr_t)x is always allowed."


> As far as I know, since the uwu function was declared with x and y as restrict, the way uwu is called in main is undefined behavior.

In the first example the integer-cast derivative of the second pointer does alias the first but isn't used to access either object, therefore there is (or should be) no UB.

> I guess if I am wrong its because `restrict` ...

The `restrict` keyword was -IIUC- specifically intended to make it possible to write functions like `memcpy()` w/ optimizations that are made possible by knowing that the pointed-to objects do not overlap. The semantics of that are crystal clear in a hand-wavy way, but... very difficult to nail down exactly.

With a hand-wavy definition of `restrict` it's pretty clear that the first `uwu()` is perfectly fine and has no UB.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: