One optimization for the C code is to put "f" suffixes on the floating point constants. For example convert this line:
t[i] += 0.02 * (float)j;
to:
t[i] += 0.02f * (float)j;
I believe this helps because 0.02 is a double and doing double * float and then converting the result to float can produce a different answer to just doing float * float. The compiler has to do the slow version because that's what you asked for.
Adding the -ffast-math switch appears to make no difference. I'm never sure what -ffast-math does exactly.
> I believe this helps because 0.02 is a double and [...] can produce a different answer
In principle, not quite. The real/unavoidable(-by-the-compiler) problem is that 0.02 is a not a diadic rational (not representable exactly as some integer over a power of two). So its representation (rounded to 52 bits) as a double is a different real number than its representation (rounded to 23 bits) as a float. (This is the same problem as rounding pi or e to a double/float, but people tend to forget that it applies to all diadic irrationals, not just regular irrationals.)
If, instead of `0.02f` you replaced `0.02` with `(double)0.02f` or `0.015625`, the optimization should in theory still apply (although missed optimization complier bugs are of course possible).
I think this is because the optimization isn't safe. I wrote a program to find a counter example to your claim that "the optimization should in theory still apply". It found one. Here's the code:
#include <stdio.h>
#include <stdlib.h>
float mul_as_float(float t) {
t += 0.02f * (float)17;
return t;
}
float mul_as_double(float t) {
t += (double)0.02f * (float)17;
return t;
}
int main() {
while (1) {
unsigned r = rand();
float t = *((float*)&r);
float result1 = mul_as_float(t);
float result2 = mul_as_double(t);
if (result1 != result2) {
printf("Counter example when t is %f (0x%x)\n", t, *((unsigned*)&t));
printf("result1 is %f (0x%x)\n", result1, *((unsigned*)&result1));
printf("result2 is %f (0x%x)\n", result2, *((unsigned*)&result2));
return 0;
}
}
}
It outputs:
Counter example when t is 0.000000 (0x3477d43f)
result1 is 0.340000 (0x3eae1483)
result2 is 0.340000 (0x3eae1482)
On my machine, the complier constant-folds the multiplication, producing a single-precision add for `mul_as_float` and a convert-t-to-double, double-precision-add, convert-sum-to-single for `mul_as_double`. I missed the `+=` in your original comment, but adding a float to a double does implicitly promote it like that, so you'd actually need:
t += (float)((double)0.02f * (float)17);
to achieve the "and then converting the result [of the multiplication] to float" (rather than keeping it a double for the addition) from your original comment. (With the above line in mul_as_double, your test code no longer finds a counterexample, at least when I ran it.)
If you ask for higher-precision intermediates, even implicitly, floating-point compliers will typically give them to you, hoped-for efficiency of single-precision be damned.
Me too when I am away from C for a while.
The topic has been on HN [3]
* Enable the use of SIMD instructions
* alter the behavior regarding NaN (you can't even check for NaN afterwards with isnan(f))
* alter the associativity of expression a+(b+c) might become (a+b)+c which seems inconspicuous at first, but there are exceptions (as example see [1] under -fassociative-math)
* change subnormals to zero (even if your program isn't compiled with this option, but a library you link to your program).
A nice overview from which I summarize is in [1] which contains a link to [2] with this nice text:
"If a sufficiently advanced compiler is indistinguishable from an adversary, then giving the compiler access to -ffast-math is gifting that enemy nukes. That doesn’t mean you can’t use it! You just have to test enough to gain confidence that no bombs go off with your compiler on your system"
Adding the -ffast-math switch appears to make no difference. I'm never sure what -ffast-math does exactly.
Minimal case on Godbolt:
https://godbolt.org/z/W18YsnMY5 - without the f
https://godbolt.org/z/oc1s8WKeG - with the f