|
The move from to gcc 4.8 to gcc 4.9 for arm32 introduced a bug in this
code. The original code is trying to out smart the compiler by arch, but
we got caught. Running benchmarks, the amount of time we save by doing this
is in the nanosecond range, so just let the compiler figure things out on
it's own.
It turns out for aarch64, x86, x86_64, two of the functions produce exactly
the same code. For swapLongs, x86/x86_64 produces slightly different code
but is about the same performance.
For arm32, letting the compiler optimize also leads to about the same
performance.
Adding unit tests and benchmark code for these.
Bug: 19692084
Change-Id: I858eb3147ef1e9e2c1894ddb226cdddcc0baf933
|