Faster Arithmetic by Flipping Signs
On PowerPC, ARM, and many other platforms, before an instruction can use a value it must first be loaded from memory into a register explicitly with a load type instruction. Interestingly, Clang 8 figures out that it can use the sign flip trick all on its own:
Because we multiply our input value by a constant, we are in control of its sign and we can leverage that fact to change our instruction into an instruction that can work with another constant from memory directly. Here is what the function looked like before I used the trick:
When I introduced the fused multiply-add support to ARM64, I looked at the above code and its generated assembly and noticed that we had a multiplication instruction followed by a subtraction instruction before our final fused multiply-add instruction.
Source: nfrechette.github.io