Parsing hex numbers with validation

Authors:	Geoff Langdale Wojciech Muła
Added on:	2022-01-17
Last update:	2022-01-29

Contents

Introduction
Algorithm #1
Algorithm #2
Algorithm #3 — by Geoff Langdale
Comparison
Converting nibbles to binary value
Sample code

Introduction

A non-validating parsing from hexadecimal string can be vectorized: Using SSE to convert from hexadecimal ASCII to number. This text shows two validating and vectorized approaches.

Parsing algorithms convert 16-byte inputs. They consists two major parts:

Validate and convert ASCII digits and letters into 4-bit values (value [0..15]) stored on separate bytes.
Merge the 4-bit words into a continues 64-bit word.

Below are the valid ASCII hexadecimal digits:

0 ... 9 — 0x30 ... 0x31,
a ... f — 0x61 ... 0x66,
A ... F — 0x41 ... 0x46.

We can note that:

Valid characters have higher nibbles equal to 0x3, 0x4 or 0x6.
Values of decimal digits equal to the lower nibble.
Values of hex digits (above 9) equal to the lower nibble plus 9.

Algorithm #1

This algorithm is a straightforward application of the above observations. The following scalar code shows the steps we perform:

const uint8_t lo_nibble = input & 0x0f
uint8_t ascii_09;
uint8_t ascii_af;

if (lo_nibble <= 9)
    ascii_09 = lo_nibble;
else
    /* error */

if ((lo_nibble >= 1) and (lo_nibble <= 6))
    ascii_af = lo_nibble + 9;
else
    /* error */

const uint8_t hi_nibble = (input >> 4) & 0xff;
if hi_nibble == 0x3
    return ascii_09;
else if ((hi_nibble == 0x4) or (hi_nibbles == 0x6))
    return ascii_af;
else
    /* error */

The vectorized algorithm #1.

Sample input: "0123456789aBcDeF":

[46|65|44|63|42|61|39|38|37|36|35|34|33|32|31|30] -- input
 F  e  D  c  B  a  9  8  7  6  5  4  3  2  1   0

Isolate the lower nibbles:

[06|05|04|03|02|01|09|08|07|06|05|04|03|02|01|00] -- lo_nibbles

const __m128i mask = _mm_set1_epi8(0x0f);
const __m128i lo_nibbles = _mm_and_si128(input, mask);

Convert lower nibbles as they were decimal digits. If a nibble is not valid (above 9) set bit #7 to indicate error. Set also bit #6 to store the character class (constant digits).

[46|45|44|43|42|41|49|48|47|46|45|44|43|42|41|40] -- lo_ascii_09

const int error   = 0x80;
const int digits  = 0x40;
const int letters = 0x20;

const lookup1 = _mm_setr_epi8(
    /* 0 */ 0 | digits,
    /* 1 */ 1 | digits,
    /* 2 */ 2 | digits,
    /* 3 */ 3 | digits,
    /* 4 */ 4 | digits,
    /* 5 */ 5 | digits,
    /* 6 */ 6 | digits,
    /* 7 */ 7 | digits,
    /* 8 */ 8 | digits,
    /* 9 */ 9 | digits,
    /* a */ error | digits,
    /* b */ error | digits,
    /* c */ error | digits,
    /* d */ error | digits,
    /* e */ error | digits,
    /* f */ error | digits);

const __m128i lo_ascii_09 = _mm_shuffle_epi8(lookup1, input);

Similarly, convert lower nibbles as they were hex letters. If a nibble is not valid (less than 1 or greater than 6) set bit #7 to indicate error. Set also bit #5 to store the character class (constant letters).

[2f|2e|2d|2c|2b|2a|80|80|80|2f|2e|2d|2c|2b|2a|80] -- lo_ascii_af

const __m128i lookup2 = _mm_setr_epi8(
    /* 0 */ error | letters,
    /* 1 */ 10 | letters, // a
    /* 2 */ 11 | letters, // b
    /* 3 */ 12 | letters, // c
    /* 4 */ 13 | letters, // d
    /* 5 */ 14 | letters, // e
    /* 6 */ 15 | letters, // f
    /* 7 */ error | letters,
    /* 8 */ error | letters,
    /* 9 */ error | letters,
    /* a */ error | letters,
    /* b */ error | letters,
    /* c */ error | letters,
    /* d */ error | letters,
    /* e */ error | letters,
    /* f */ error | letters);

const __m128i lo_ascii_af = _mm_shuffle_epi8(lookup2, lo_nibbles);

Classify ASCII characters based on the higher nibbles. Split bytes into three classes: decimal digits, hex letters, invalid.

[20|20|20|20|20|20|40|40|40|40|40|40|40|40|40|40] -- input
 F  e  D  c  B  a  9  8  7  6  5  4  3  2  1  0

const __m128i lookup3 = _mm_setr_epi8(
    /* 0 */ error,
    /* 1 */ error,
    /* 2 */ error,
    /* 3 */ decimal,
    /* 4 */ letter,
    /* 5 */ letter,
    /* 6 */ error,
    /* 7 */ error,
    /* 8 */ error,
    /* 9 */ error,
    /* a */ error,
    /* b */ error,
    /* c */ error,
    /* d */ error,
    /* e */ error,
    /* f */ error);

const __m128i mask       = _mm_set1_epi8(0x0f);
const __m128i hi_nibbles = _mm_and_si128(_mm_srli_epi32(input, 4), mask);
const __m128i hi_class   = _mm_shuffle_epi8(lookup3, hi_nibbles);

Keep only the decimal digits using the classification from point #4.

[00|00|00|00|00|00|49|48|47|46|45|44|43|42|41|40] -- ascii_09

const __m128i zero = _mm_setzero_si128();
const __m128i ascii_09_mask = _mm_cmpeq_epi8(_mm_and_si128(hi_class, lo_ascii_09), zero);
const __m128i ascii_09 = _mm_andnot_si128(ascii_09_mask, lo_ascii_09);

Likewise, keep only the hex: letters using the classification from point #4.

[2f|2e|2d|2c|2b|2a|00|00|00|00|00|00|00|00|00|00] -- ascii_af

const __m128i ascii_af_mask = _mm_cmpeq_epi8(_mm_and_si128(hi_class, lo_ascii_af), zero);
const __m128i ascii_af = _mm_andnot_si128(ascii_af_mask, lo_ascii_af);

Check for errors: or together the converted characters and the classification. If the result vector has any 7th bit set, it means an error.

[2f|2e|2d|2c|2b|2a|49|48|47|46|45|44|43|42|41|40] -- res == err

const __m128i res = _mm_or_si128(ascii_09, ascii_af)
const __m128i err = _mm_or_si128(hi_class, res)
if (_mm_movemask_epi8(err1)) {
    ok = false;
    return 0;
}

Keep only lower nibbles.

[0f|0e|0d|0c|0b|0a|09|08|07|06|05|04|03|02|01|00] -- result

const __m128i result = _mm_and_si128(res, mask);

Reverse nibbles and join into a single 64-bit word — see separate section.

Algorithm #2

In this algorithm we change the validation rules. We use higher nibbles to fetch allowed ranges for lower nibbles. For decimal digits (higher nibble 0x3) the range is 0..9. For hex letters (hi nibble 0x40 or 0x60) the range is 1..6.

Once we validated that the lower nibbles are correct, we have to add 9 to hex letters and keep decimal digits intact. The following observation lets to obtain the 9 quite easily: for hex letters the upper bound (hi hereafter) is 6, for digits it is 9. The expression ~hi & 9 yields 9 for hi=6 and zero when hi=9.

This branchy algorithm illustrates the whole idea.

const uint_8 hi_nibble = (input >> 4) & 0xff;

uint8_t lo;
uint8_t hi;

switch (hi_nibble) {
    case 0x3:
        lo = 0;
        hi = 9;
        break;
    case 0x4:
    case 0x6:
        lo = 1
        hi = 6
    default:
        /* error */
}

const uint8_t lo_nibble = input & 0x0f;

if ((lo_nibble < lo) or (lo_nibble > hi))
    /* error */

const uint8_t shift = ~hi & 9;

return lo_nibble + shift;

The vectorized algorithm #2.

Sample input.

[46|65|44|63|42|61|39|38|37|36|35|34|33|32|31|30] -- input
 F  e  D  c  B  a  9  8  7  6  5  4  3  2  1   0

Get higher nibble.

[04|06|04|06|04|06|03|03|03|03|03|03|03|03|03|03] -- hi_nibbles

const __m128i mask = _mm_set1_epi8(0x0f);
const __m128i hi_nibbles = _mm_and_si128(_mm_srli_epi32(input, 4), mask);

Get the lower bounds.

[01|01|01|01|01|01|00|00|00|00|00|00|00|00|00|00] -- lo

const __m128i lower_bound = _mm_setr_epi8(
    /* 0x00 */ 0xff,
    /* 0x10 */ 0xff,
    /* 0x20 */ 0xff,
    /* 0x30 */    0,
    /* 0x40 */    1,
    /* 0x50 */ 0xff,
    /* 0x60 */    1,
    /* 0x70 */ 0xff,
    /* 0x80 */ 0xff,
    /* 0x90 */ 0xff,
    /* 0xa0 */ 0xff,
    /* 0xb0 */ 0xff,
    /* 0xc0 */ 0xff,
    /* 0xd0 */ 0xff,
    /* 0xe0 */ 0xff,
    /* 0xf0 */ 0xff
);

const __m128i lo = _mm_shuffle_epi8(lower_bound, hi_nibbles);

Get the upper bounds.

[06|06|06|06|06|06|09|09|09|09|09|09|09|09|09|09] -- hi

const __m128i upper_bound = _mm_setr_epi8(
    /* 0x00 */ 0xff,
    /* 0x10 */ 0xff,
    /* 0x20 */ 0xff,
    /* 0x30 */    9,
    /* 0x40 */    6,
    /* 0x50 */ 0xff,
    /* 0x60 */    6,
    /* 0x70 */ 0xff,
    /* 0x80 */ 0xff,
    /* 0x90 */ 0xff,
    /* 0xa0 */ 0xff,
    /* 0xb0 */ 0xff,
    /* 0xc0 */ 0xff,
    /* 0xd0 */ 0xff,
    /* 0xe0 */ 0xff,
    /* 0xf0 */ 0xff
);

const __m128i hi = _mm_shuffle_epi8(upper_bound, hi_nibbles);

Validate lower nibbles against the ranges. Please note that SSE does not provide unsigned comparison, however in our case we use only lower nibbles, so never hit signed numbers.

const __m128i gt_hi = _mm_cmplt_epi8(hi, lo_nibbles); // lo_nibble > hi_bound
const __m128i lt_lo = _mm_cmplt_epi8(lo_nibbles, lo); // lo_nibble < lo_bound
const __m128i error = _mm_or_si128(gt_hi, lt_lo);

if (_mm_movemask_epi8(error)) {
    ok = false;
    return 0;
}

Keep only lower nibbles.

[0f|0e|0d|0c|0b|0a|09|08|07|06|05|04|03|02|01|00] -- result

const __m128i mask  = _mm_set1_epi8(0x0f);
const __m128i result = _mm_and_si128(input, mask);

Add 9 to hex letter, keep digits intact.

const __m128i shift = _mm_andnot_si128(hi, _mm_set1_epi8(9));
const __m128i result = _mm_add_epi8(lo_nibbles, shift);

Reverse nibbles and join into a single 64-bit word — see separate section.

Algorithm #3 — by Geoff Langdale

The Geoff's approach does not use PSHUFB, it is based on shifting number ranges. The algorithm works in two phases, separately converting the digits and hex letters.

First, digits are shifted from range '0'..'9' to 0xf6..0xff and then 0xf0..0xf9. All other bytes — above '9' and below '0' — become smaller than 0xf0. After subtracting 0xf0, the ASCII digits are transformed into valid nibbles (0..9), while other values become greater than 0x0f.

The second step converts the hex digits. At the very beginning, the ASCII letters are converted into upper case by resetting the 5th bit. Then, in a similar way as with the digits, we shift ASCII letters 'A'..'F' into range 0..5. Any other bytes become bigger than 5. Then we adjust range 0..5 to 10..15, keeping the non-hex bytes greater than 0x0f.

At this point we have the following possibilities:

digits — the result of step #1 is a valid nibble (0..9), the result of step #2 is greater than 0x0f.
hex letters — the result of step #1 is greater than 0xf0, the result of step #2 is a valid nibble (10..15).
other — results of both steps are greater than 0x0f.

It is sufficient to find the minimum of both steps. If either step resulted in valid nibble it will be chosen. Otherwise, the minimum operation yields some byte greater than 0x0f. Thus, to detect errors we have only check if the result of minimum is greater than 0x0f.

The vectorized algorithm #3.

Move digits '0'..'9' into range 0xf6..0xff.

const __m128i t1 = _mm_add_epi8(v, _mm_set1_epi8(char(0xff - '9')));

And then correct the range to 0xf0..0xf9. All other bytes become less than 0xf0.
```
const __m128i t2 = _mm_subs_epu8(t1, _mm_set1_epi8(6));
```
Convert '0'..'9' into nibbles 0..9. Non-digit bytes become greater than 0x0f.
```
const __m128i t3 = _mm_sub_epi8(t2, _mm_set1_epi8(char(0xf0)));
```

Convert into uppercase 'a'..'f' => 'A'..'F'.

const __m128i t4 = _mm_and_si128(v, _mm_set1_epi8(char(0xdf)));

Move hex letter 'A'..'F' into range 0..5.

const __m128i t5 = _mm_sub_epi8(t4, _mm_set1_epi8('A'))

And correct the range into 10..15. The non-hex letters bytes become greater than 0x0f.
```
const __m128i t6 = _mm_adds_epu8(t5, _mm_set1_epi8(10));
```
Finally choose the result: either valid nibble (0..9/10..15) or some byte greater than 0x0f.
```
const __m128i t7 = _mm_min_epu8(t3, t6);
```

Detect errors, i.e. bytes greater than 15. As SSE does not provide an unsigned compare, we have to use a trick with the saturated add.

const __m128i t8 = _mm_adds_epu8(t7, _mm_set1_epi8(127-15));
if (_mm_movemask_epi8(t8)) {
    ok = false;
    return 0;
}

Reverse nibbles and join into a single 64-bit word — see separate section.

Comparison

The total number of instructions
instruction	algorithm #1	algorithm #2	algorithm #3
shift	1	1	--
pshufb	3	2	--
bit-and	7	3	1
bit-or	2	1	--
comparison	2	2	--
move mask	1	1	--
add/sub	--	1	6
min	--	--	1
sum	16	11	8

Converting nibbles to binary value

After completing any of the above algorithms we have the following layout of 16 nibbles (from a to p):

[0000|aaaa|0000|bbbb|0000|cccc|0000|dddd|...|0000|oooo|0000|pppp]
| byte 15 | byte 14 | byte 13 | byte 12 |...| byte 1  | byte 0  |

The target is 64-bit word, where nibbles are reversed:

[pppp|oooo|...|dddd|cccc|bbbb|aaaa]

What we need to do, is: 1) first put two from adjacent nibbles in a byte, and 2) store only bytes having proper nibble pairs.

Joining adjacent nibbles can be easily done with shift, followed by bit-and and bit-or. A single instruction pmaddubsw can do that too. It multiplies two pairs of bytes, add the products and yield 16-bit words.

Then we need to store only single bytes from 16-bit words in a 64-bit result word (the lower half of an SSE register).

Below is a code snippet that shows constants for pmaddubsw and pshufb.

const __m128i t0 = _mm_maddubs_epi16(result, _mm_set1_epi16(0x0110));
const __m128i t1 = _mm_setr_epi8(
    14, 12, 10, 8, 6, 4, 2, 0,
    -1, -1, -1, -1, -1, -1, -1, -1);
const __m128i t3 = _mm_shuffle_epi8(t0, t1);

Sample code

Sample code is available.

https://github.com/WojciechMula/toys/tree/master/simd-parse-hex