The code intentionally uses `y` to access stuff that it also accesses through `x...

hvdijk · on April 15, 2022

> With the right hackish arithmetic, all pointers could be aliased with pointer arithmetic in C.

This is not true. The behaviour is undefined in C if pointer arithmetic results in crossing the beginning or end of an object. If we have int a, b;, then the behaviour is undefined if we do &b - &a even if we never use the result to try to reconstruct &b from &a.

> Using `restrict` says your code won't do that so the compiler can optimize more.

`restrict` covers objects modified through one pointer and accessed through another. It does not cover pointers that point to the same object in general; if only one is used to access the object, or if both are used only for reading, all is fine. (It's actually a little bit more complicated than I'm presenting here, but not in a way that's relevant right now.)

xscott · on April 15, 2022

Seems like we need a C compiler flag like "--acknowledge-reality", because the compilers for nearly 100% of the server, desktop, and mobile computers in the real world have flat address spaces and do allow arbitrary pointer arithmetic.

Yes, it's technically UB. But which operation could any C compiler disallow in practice: converting from pointer to integer, arithmetic on integers, or converting from integer to pointer?

hvdijk · on April 15, 2022

Sorry to disappoint, but the compilers for nearly 100% of the server, desktop, and mobile computers in the real world do not allow arbitrary pointer arithmetic. They cannot issue an error message for it because that is provably impossible to detect in the general case, but they do optimise on the basis that it does not happen, even if it breaks programs that try to make use of it. Consider this example for GCC:

    int a[2], b[2];
    int offset(void) { return b - a; }
    int check(void) { return &a[1] + offset() == &b[1]; }

The function check() is optimised by GCC to return zero at -O1 optimisation level or higher, because it reasons that no matter how offset() is implemented, either the addition is undefined, or the comparison results in false.

(Note that GCC does this even in a few cases where it is unclear whether the optimisation is valid. The example I provided is slightly more complicated than I would have liked to avoid that issue; the optimisation is definitely valid in this example.)

xscott · on April 15, 2022

I originally responded to your code, and you're right about it. That's not what I expected or would want to happen.

However, your code doesn't actually address my comment about conversions and arithmetic. Where does this code code fail?

    int a[2], b[2];
    uintptr_t x = (uintptr_t)a;
    uintptr_t y = (uintptr_t)b;
    uintptr_t offset = y - x;
    int* p = (int*)(x + offset);
    printf("b == p: %d\n", b==p);

I didn't try _every_ compiler on Godbolt, but I didn't see it misbehave anywhere I did try.

edit: changed to uintptr_t

hvdijk · on April 15, 2022

That is integer arithmetic, not pointer arithmetic. My understanding of the C memory model that is mentioned in the article (look for PNVI-ae-udi) is that what you are doing is or will be well-defined. I suspect there may still be a few implementations around where the offset you get from integer subtractions has no obvious relation to the offset you would get from pointer subtractions (for two pointers where subtraction is well-defined), but for your example that makes no difference.

xscott · on April 15, 2022

> That is integer arithmetic, not pointer arithmetic.

Yeah, but I did say: "which operation could any C compiler disallow in practice: converting from pointer to integer, arithmetic on integers, or converting from integer to pointer?"

If C/C++ compilers keep breaking pointer arithmetic in the game of exploiting undefined behavior for optimizations, people are going to start doing pointer arithmetic with integers when they need it. And they do need it sometimes, for debuggers, profilers, memory checkers, JITs, garbage collectors, shared memory, and so on.

hvdijk · on April 16, 2022

That may be good. If the pointer arithmetic with integers, or other constructs that have the effect of disabling optimisations, is kept to the code that has additional requirements beyond what the standard guarantees, that means the code out there that does not have those additional requirements, which I suspect is the majority of code, can continue to be aggressively optimised.

spc476 · on April 15, 2022

There's nothing in the C standard that I could find that dictates that b[] has to follow directly after a[] in memory. That it works is just happenstance (or an implementation detail). The only place were order is maintained are fields in a structure definition.

xscott · on April 15, 2022

There's nothing in that example that assumes they are adjacent. It only assumes they are in the same (flat) address space. Put a gigabyte between them if you want.

If I had used `uintptr_t` instead of `ssize_t`, I think it's even compliant as far as wraparound goes.

edit: Note I changed the code to use uintptr_t

xscott · on April 15, 2022

Same compiler and optimization level provides different behavior when you ask it to compile as C vs C++. Gross.

spc476 · on April 15, 2022

I used to program on a platform where you could have two pointers (say, integer pointers) that were not equal to each other, yet a store to one would change the location pointed to by the other pointer. Hint: it was a very popular architecture in the 80s and 90s. Spoiler: The 8088. I'm not saying I want to go back to that era (yes, I do like flat address spaces) but even on modern systems today it's possible to map memory such that two different pointers point to the same physical memory (but yes, you have to go out of your way to do that these days).

If I could, I would disallow integer to pointer---there are (in my opinion) better ways to address a hardware register than to do

    unsigned char *uart = (unsigned char *)0xdeadbeef; /* [1] */

But I don't think it will go away any time soon in C (Over 30+ years later, and we're still a few years out from 1's compliment and signed magnitude integers being deprecated).

[1] Here's one way using NASM:

         global   uart
         absolute 0xdeadbeef
    uart resb 1

Then in C code:

    extern unsigned char uart;

xscott · on April 15, 2022

> I used to program on a platform [...]

Yeah, and I know that's why the C standard has so much slop in it. They want to keep the window of conformance open for past or future architectures.

> If I could, I would disallow integer to pointer [...]

It's not just about 0xDeadBeef hardware addresses, and I don't think anyone is talking about breaking assembly language.

However, C has unions and Rust has enums. You want those to share the same piece of memory in the fewest bytes you can get away with. Some interpreters, written in C, even use nan-tagging to store the important 48 bits of a pointer in the 52 bits of payload with double precision floats.

You don't have to like it, and you can discourage your children or coworkers from doing it, but there are legitimate reasons for all of this. The examples look ugly because they're short and stupid.

And nobody has even mentioned that `A[i] == (A + i) == (i + A) == i[A]` is going to continue to be valid in C. https://stackoverflow.com/a/381549