Using PEXT to convert from hexadecimal ASCII to number

Author:Wojciech Muła
Added on:2014-10-09

Naive conversion from an ASCII digit to a value could be coded with a switch instruction or a lookup table — too simple, right? This is another note where I try to exploit PEXTparallel extract, the new instruction introduced in extension BMI2 (Bit Manipulation Instructions). By the way I present nice branchless algorithm to convert an ASCII letter to a number.

Let see which codes are assigned to hexadecimal digits:

digit code value letter code value
'0' 0x30 0 'a' 0x61 10
'1' 0x31 1 'b' 0x62 11
'2' 0x32 2 'c' 0x63 12
'3' 0x33 3 'd' 0x64 13
'4' 0x34 4 'e' 0x65 14
'5' 0x35 5 'f' 0x66 15
'6' 0x36 6 'A' 0x41 10
'7' 0x37 7 'B' 0x42 11
'8' 0x38 8 'C' 0x43 12
'9' 0x39 9 'D' 0x44 13
  'E' 0x45 14
'F' 0x46 15

Observations:

Branchless code converting single letter to numeric value:

uint8_t hex_letter2value(uint8_t chr) {

    const uint8_t letter = chr & 0x40;
    const uint8_t shift  = letter >> 3 | letter >> 6; // 9 if chr is letter, 0 otherwise

    // this sum is safe -- if shift = 9, then max value in lower
    // nibble is 6, and there won't be an overflow
    const uint8 adjusted = chr + shift;

    return adjusted & 0xf;
}

Following operations are performed:

For a single letter it's too expansive, fortunately this algorithm could be easily translated to SIMD and SWAR code (I hope SIMD version appear soon).

32-bit SWAR version:

#define packed32(b) (uint8_t(b) * 0x01010101)

uint16_t four_letters_to_value(char* str) {

    const uint32_t input = bswap(*(uint32_t*)str); // bswap is required

    const uint32_t letter = input & packed32(0x40);
    const uint32_t shift  = letter >> 3 | letter >> 6;

    const uint32_t adjusted = input + shift;

    // for example:
    //     adjusted    = 0x0b0a0d07
    //     pext result = 0x000000bad7

    return pext(adjusted, 0x0f0f0f0f);
}

Much better — now conversion from 4 letters (or 8, when operate on 64-bit words) requires:

Sample implementation is available at github.