| Author: | Wojciech Muła |
|---|---|
| Added on: | 2014-10-09 |
Naive conversion from an ASCII digit to a value could be coded with a switch instruction or a lookup table — too simple, right? This is another note where I try to exploit PEXT — parallel extract, the new instruction introduced in extension BMI2 (Bit Manipulation Instructions). By the way I present nice branchless algorithm to convert an ASCII letter to a number.
Let see which codes are assigned to hexadecimal digits:
| digit | code | value | letter | code | value |
|---|---|---|---|---|---|
| '0' | 0x30 | 0 | 'a' | 0x61 | 10 |
| '1' | 0x31 | 1 | 'b' | 0x62 | 11 |
| '2' | 0x32 | 2 | 'c' | 0x63 | 12 |
| '3' | 0x33 | 3 | 'd' | 0x64 | 13 |
| '4' | 0x34 | 4 | 'e' | 0x65 | 14 |
| '5' | 0x35 | 5 | 'f' | 0x66 | 15 |
| '6' | 0x36 | 6 | 'A' | 0x41 | 10 |
| '7' | 0x37 | 7 | 'B' | 0x42 | 11 |
| '8' | 0x38 | 8 | 'C' | 0x43 | 12 |
| '9' | 0x39 | 9 | 'D' | 0x44 | 13 |
| 'E' | 0x45 | 14 | |||
| 'F' | 0x46 | 15 | |||
Observations:
Branchless code converting single letter to numeric value:
uint8_t hex_letter2value(uint8_t chr) {
const uint8_t letter = chr & 0x40;
const uint8_t shift = letter >> 3 | letter >> 6; // 9 if chr is letter, 0 otherwise
// this sum is safe -- if shift = 9, then max value in lower
// nibble is 6, and there won't be an overflow
const uint8 adjusted = chr + shift;
return adjusted & 0xf;
}
Following operations are performed:
For a single letter it's too expansive, fortunately this algorithm could be easily translated to SIMD and SWAR code (I hope SIMD version appear soon).
32-bit SWAR version:
#define packed32(b) (uint8_t(b) * 0x01010101)
uint16_t four_letters_to_value(char* str) {
const uint32_t input = bswap(*(uint32_t*)str); // bswap is required
const uint32_t letter = input & packed32(0x40);
const uint32_t shift = letter >> 3 | letter >> 6;
const uint32_t adjusted = input + shift;
// for example:
// adjusted = 0x0b0a0d07
// pext result = 0x000000bad7
return pext(adjusted, 0x0f0f0f0f);
}
Much better — now conversion from 4 letters (or 8, when operate on 64-bit words) requires:
Sample implementation is available at github.