Author: | Wojciech Muła |
---|---|
Added on: | 2014-10-09 |
Naive conversion from an ASCII digit to a value could be coded with a switch instruction or a lookup table — too simple, right? This is another note where I try to exploit PEXT — parallel extract, the new instruction introduced in extension BMI2 (Bit Manipulation Instructions). By the way I present nice branchless algorithm to convert an ASCII letter to a number.
Let see which codes are assigned to hexadecimal digits:
digit | code | value | letter | code | value |
---|---|---|---|---|---|
'0' | 0x30 | 0 | 'a' | 0x61 | 10 |
'1' | 0x31 | 1 | 'b' | 0x62 | 11 |
'2' | 0x32 | 2 | 'c' | 0x63 | 12 |
'3' | 0x33 | 3 | 'd' | 0x64 | 13 |
'4' | 0x34 | 4 | 'e' | 0x65 | 14 |
'5' | 0x35 | 5 | 'f' | 0x66 | 15 |
'6' | 0x36 | 6 | 'A' | 0x41 | 10 |
'7' | 0x37 | 7 | 'B' | 0x42 | 11 |
'8' | 0x38 | 8 | 'C' | 0x43 | 12 |
'9' | 0x39 | 9 | 'D' | 0x44 | 13 |
'E' | 0x45 | 14 | |||
'F' | 0x46 | 15 |
Observations:
Branchless code converting single letter to numeric value:
uint8_t hex_letter2value(uint8_t chr) { const uint8_t letter = chr & 0x40; const uint8_t shift = letter >> 3 | letter >> 6; // 9 if chr is letter, 0 otherwise // this sum is safe -- if shift = 9, then max value in lower // nibble is 6, and there won't be an overflow const uint8 adjusted = chr + shift; return adjusted & 0xf; }
Following operations are performed:
For a single letter it's too expansive, fortunately this algorithm could be easily translated to SIMD and SWAR code (I hope SIMD version appear soon).
32-bit SWAR version:
#define packed32(b) (uint8_t(b) * 0x01010101) uint16_t four_letters_to_value(char* str) { const uint32_t input = bswap(*(uint32_t*)str); // bswap is required const uint32_t letter = input & packed32(0x40); const uint32_t shift = letter >> 3 | letter >> 6; const uint32_t adjusted = input + shift; // for example: // adjusted = 0x0b0a0d07 // pext result = 0x000000bad7 return pext(adjusted, 0x0f0f0f0f); }
Much better — now conversion from 4 letters (or 8, when operate on 64-bit words) requires:
Sample implementation is available at github.