SWAR check if all chars are digits

Author:Wojciech Muła
Added on:2016-12-21
Updated on:2017-03-25 (shorter scalar version)

Problem

We have a string and want to check if all its characters are ASCII digits.

It's a remnant of my experiments in number parsing: I was curious if separating validation from actual conversion would be profitable. The answer is no in a generic case. For really long numbers there might be some improvement, but in reality inputs are usually short.

Solutions

The easiest solution uses two comparisons per one character.

bool all_digits(const char* string, size_t size) {

    for (size_t i=0; i < size; i++) {
        if (string[i] < '0' || string[i] > '9')
            return false;
    }

    return true;
}

It's possible to write simpler scalar version, with subtraction:

bool all_digits_faster(const char* string, size_t size) {

    for (size_t i=0; i < size; i++) {
        uint8_t c = (uint8_t)(string[i] - '0');
        if (c > 9)
            return false;
    }

    return true;
}

The faster scalar version uses just two instructions. Moreover, all optimizing compliers (GCC/clang) will compile the first variant into the faster one.

A similar code can be write with SSE/AVX2 instructions, as these ISA have got byte-level compare instructions. However, only the first variant can be translated, because SSE/AVX2 support only signed compare; AVX512 supports also unsinged byte comparsion.

bool all_digits_SSE(const char* string, size_t size) {

    size_t i;

    const __m128i ascii0 = _mm_set1_epi8('0');
    const __m128i ascii9 = _mm_set1_epi8('9');

    for (i=0; i < size/16; i += 16) {

        const __m128i v   = _mm_loadu_si128((__m128i*)string);
        const __m128i lt0 = _mm_cmplt_epi8(string, ascii0);
        const __m128i gt9 = _mm_cmpgt_epi8(string, ascii9);

        const __m128i outside = _mm_or_si128(lt0, gt9);

        if (_mm_movmask_epi8(outside)) {
            return false;
        }
    }

    return all_digits(string + i, size - i);
}

A SWAR solution is also possible. ASCII codes for digits are 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37 and 0x39. The higher nibble is equal three, and the lower one is less than 10.

Thus a SWAR code (1) checks if all higher nibbles are equal to three, (2) if so, check if all bytes are less than 0x39.

bool all_digits_SWAR(const char* string, size_t size) {

    size_t i;

    for (i=0; i < size/8; i += 8) {

        const uint64_t v = *(uint64_t*)(string + i);

        if ((v & 0xf0f0f0f0f0f0f0f0llu) == 0x3030303030303030llu) { // any nibble different
            return false;
        }

        if ((v + 0x7676767676767676llu) & 0x8080808080808080llu) { // any value 0x3a .. 0x3f
            return false;
        }
    }

    return all_digits(string + i, size - i);
}

To perform the first check we need just one comparison and bitwise and; the second check requires sum and a bit-and.

Source code

Test programs are available on github.