Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

eg 4:

   int foo(char const *s) {
     if (s[0] == 'h' && s[1] == 'e' && s[2] == 'l' && s[3] == 'l')
        return 1;
     return 0;
   }
The outputs 4 cmp instructions here, even though I'd have thought 1 was sufficient. https://godbolt.org/z/hqMnbrnKe


`s[0] == 'h'` isn't sufficient to guarantee that `s[3]` can be access without a segfault, so the compiler is not allowed to perform this optimization.

If you use `&` instead of `&&` (so that all array elements are accessed unconditionally), the optimization will happen: https://godbolt.org/z/KjdT16Kfb

(also note you got the endianness wrong in your hand-optimized version)


> If you use `&` instead of `&&` (so that all array elements are accessed unconditionally), the optimization will happen

But then you're accessing four elements of a string that could have a strlen of less than 3. If the strlen is 1 then the short circuit case saves you because s[1] will be '\0' instead of 'e' and then you don't access elements past the end of the string. The "optimized" version is UB for short strings.


Yes, so that's why the compiler can't and doesn't emit the optimized version if you write the short circuited version - because it behaves differently for short strings.


UB doesn't exist in the processor (it does, but not here). If the compiler knows the pointer is aligned it can do the transformation.


For the compiler to know the pointer is aligned it would have to actually be aligned and there is no guarantee that it is.


This is fantastic, thanks! This is the approach I use in httpdito to detect the CRLFCRLF that terminates an HTTP/1.0 GET request, but I'm doing it in assembly.


Ooo, I'd never thought of using & like that. Interesting.

> (also note you got the endianness wrong in your hand-optimized version) Doh :-)


Matt Godbolt's talk on ray tracers, shows how effective that change can be. Think it was that talk anyway.

https://www.youtube.com/watch?v=HG6c4Kwbv4I


good ol' short circuiting


If you want to tell the compiler not to worry about the possible buffer overrun then you can try `int foo(char const s[static 4])`. Or use `&` instead of `&&` to ensure that there is no short-circuiting, e.g. `if ((s[0] == 'h') & (s[1] == 'e') & (s[2] == 'l') & (s[3] == 'l'))` Either way, this then compiles down to a single 32-bit comparison.

Interestingly, it is comparing against a different 32-bit value than `bar` does. I think this is because you accidentally got the order backwards in `bar`.

The code in `bar` is probably not a good idea on targets that don't like unaligned loads.


That's because the 1 instruction variant may read past the end of an array. Let's say s is a single null byte at 0x2000fff, for example (and that memory is only mapped through 0x2001000); the function as written is fine, but the optimized version may page fault.


Ah, yes, good point. I think this is a nice example of "I didn't notice I needed to tell the compiler a thing I know so it can optimize".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: