Author: | Wojciech Muła |
---|---|
Added on: | 2010-06-09 |
Suppose we need to get a mask when a nonnegative argument is greater then some constant value; in other words, we want to evaluate following expression:
if x > const_n then mask := 0xffffffff; else mask := 0x00000000;
Portable branchless solution:
The key to understand this trick is binary form of M: 0111..1111zzzz, where z is 0 or 1 depending on n value. When x is greater then n, then x + M has form 1000..000zzzz, because the carry bit propagates through series of ones to the k-th position of the result.
Real world example — branchless converting hex digit to ASCII (M=0x7ffffff6 for k=31 and n=9).
; input: eax - hex digit ; output: eax - ASCII letter (0-9, A-F or a-f) ; destroys: ebx andl 0xf, %eax leal 0x7ffffff6(%eax), %ebx ; MSB(ebx)=1 when eax >= 10 sarl $31, %ebx ; ebx - mask andl $7, %ebx ; ebx = 7 when eax >= 10 (for A-F letters) ;andl $39, %ebx ; ebx = 39 when eax >= 10 (for a-f letters) leal '0'(%eax, %ebx), %eax ; eax = '0' + eax + ebx => ASCII letter
It is also possible to convert 4 hex digits in parallel using similar algorithm, but the input data have to be correctly prepared. Moreover generating mask requires 3 instructions and one extra register (in a scalar version just one arithmetic shift). I guess it wont be fast on x86, maybe this approach would be good for a SIMD code, where similar code transforms more bytes at once.
; input: eax - four hex digits in form [0a0b0c0d] ; output: eax - four ascii letters ; destroys: ebx, ecx leal 0x76767676(%eax), %ebx ; MSB of each byte is set when corresponding eax byte is >= 10 ; (here: 0x7f - 9 = 0x76) andl $0x80808080, %ebx movl %ebx, %ecx shrl $7, %ebx subl %ebx, %ecx ; ecx - byte-wise mask ;andl $0x07070707, %ecx ; for ASCII letters A-F andl $0x27272727, %ecx ; for ASCII letters a-f leal 0x30303030(%eax, %ecx), %eax ; ecx - four ascii letters
See also: SSSE3: printing hex values (weird use of PSHUFB instruction)