I'm of course not suggesting branching in cases where you expect a 30% misprediction rate. You'd do branchless reduction from [-2*pi;2*pi] or whatever you expect to be frequent, and branch on inputs with magnitude greater than 2*pi if you want to be extra sure you don't get wrong results if usage changes.
Again, we're in a situation where we know we can tolerate a 0.5% error, we can spare a bit of time to think about what range needs to be handled fast or supported at all.
Those reductions need to be part of the function being benchmarked, though. Assuming a range limitation of [-pi,pi] even would be reasonable, there's certainly cases where you don't need multiple revolutions around a circle. But this can't even do that, so it's simply not a substitute for sin, and claiming 40x faster is a sham
Right; the range reduction from [-pi;pi] would be like 5 instrs ("x -= copysign(pi/2 & (abs(x)>pi/2), x)" or so), ~2 cycles throughput-wise or so, I think; that's slightly more significant than I was imagining, hmm.
It's indeed not a substitute for sin in general, but it could be in some use-cases, and for those it could really be 40x faster (say, cases where you're already externally doing range reduction because it's necessary for some other reason (in general you don't want your angles infinitely accumulating scale)).
Again, we're in a situation where we know we can tolerate a 0.5% error, we can spare a bit of time to think about what range needs to be handled fast or supported at all.