Author: | Wojciech Muła |
---|---|
Added on: | 2014-09-30 |
There are some places where a low-level programmer can beat a compiler. Consider this simple code:
#include <stdint.h> uint32_t bsr(uint32_t x) { // xor, because this builtin returns 31 - bsr(x) return __builtin_clz(x) ^ 31; } uint32_t min1(uint32_t x) { if (x != 0) { return bsr(x) + 1; } else { return 1; } }
Function min1 is compiled to (GCC 4.8 with flag -O3):
min1: movl 4(%esp), %edx movl $1, %eax testl %edx, %edx je .L3 bsrl %edx, %eax addl $1, %eax .L3: rep ret
There is a condtional jump, not very good. When we rewrite the function:
uint32_t min2(uint32_t x) { return bsr(x | 1) + 1; }
Result is this nice branchless code:
min2: movl 4(%esp), %eax orl $1, %eax bsrl %eax, %eax addl $1, %eax ret
Conculsion: it's worth to check a compiler output. Sometimes.