Conditionally fill word (for limited set of input values)

Article:Wojciech Muła
Added on:2014-10-01
Updated on:2018-03-11

"Limited" means a value where at most one bit is set. I.e. values are zero and all powers of two. Specification:

  1. if x is zero then result is also zero,
  2. if x is power of two result is word full of zeros.

Naive implementation:

uint32_t fill_word_naive(uint32_t x) {

    return x ? 0xffffffff : 0x00000000;
}

GCC 4.8.2 produces following code:

89 c2        mov    %eax,%edx
f7 da        neg    %edx
21 c2        and    %eax,%edx
39 c2        cmp    %eax,%edx
0f 94 c0     sete   %al
0f b6 c0     movzbl %al,%eax

Not bad, but this can be done much simplier. We know that a value is zero or have exactly one bit set — first we have to copy this bit to the highest position, and then populate the MSB using arithmetic shift right.

Copying a bit can be done using single operation — arithmetic negation:

x -x
00000000 00000000
00000001 ffffffff
00000002 fffffffe
00000004 fffffffc
00000008 fffffff8
00000010 fffffff0
00000020 ffffffe0
00000040 ffffffc0
00000080 ffffff80
00000100 ffffff00
00000200 fffffe00
00000400 fffffc00
00000800 fffff800
00001000 fffff000
00002000 ffffe000
00004000 ffffc000
00008000 ffff8000
00010000 ffff0000
00020000 fffe0000
00040000 fffc0000
00080000 fff80000
00100000 fff00000
00200000 ffe00000
00400000 ffc00000
00800000 ff800000
01000000 ff000000
02000000 fe000000
04000000 fc000000
08000000 f8000000
10000000 f0000000
20000000 e0000000
40000000 c0000000
80000000 80000000

Now the procedure could be saved as:

uint32_t fill_word_naive(uint32_t x) {

    return ((int32_t)(-x) >> 31);
}

The compilation result:

f7 d8        neg    %eax
c1 f8 1f     sar    $0x1f,%eax

Just two simple instructions.

2018-03-11: GCC 7.2 still compiles fill_word_naive the same way, but clang 6.0 produces the final, two-instruction sequence.