C++ bitset vs array

Author:Wojciech Muła
Added on:2014-03-22

The C++ bitset conserves a memory, but at cost of speed access. The bitset must be slower than a set represented as a plain old array, at least when sets are small (say a few hundred elements).

Lets look at this simple functions:

// set_test.cpp

#include <stdint.h>
#include <bitset>

const int size = 128;

typedef uint8_t byte_set[size];

bool any_in_byteset(uint8_t* data, size_t size, byte_set set) {
        for (auto i=0u; i < size; i++)
                if (set[data[i]])
                        return true;

        return false;

typedef std::bitset<size> bit_set;

bool any_in_bitset(uint8_t* data, size_t size, bit_set set) {
        for (auto i=0u; i < size; i++)
                if (set[data[i]])
                        return true;

        return false;

The file was compiled with g++ -std=c++11 -O3 set_test.cpp; Assembly code of the core of any_in_byteset:

28: 0f b6 10              movzbl (%eax),%edx
2b: 83 c0 01              add    $0x1,%eax
2e: 80 3c 11 00           cmpb   $0x0,(%ecx,%edx,1)
32: 75 0c                 jne    40
34: 39 d8                 cmp    %ebx,%eax
36: 75 f0                 jne    28

Statement if (set[data[i]]) return true are lines 28, 2e and 32, i.e.: load from memory, compare and jump. Instructions 2b, 34 and 36 handles the for loop.

Now look at assembly code of any_in_bitset:

5f: 0f b6 13              movzbl (%ebx),%edx
62: b8 01 00 00 00        mov    $0x1,%eax
67: 89 d1                 mov    %edx,%ecx
69: 83 e1 1f              and    $0x1f,%ecx
6c: c1 ea 05              shr    $0x5,%edx
6f: d3 e0                 shl    %cl,%eax
71: 85 44 94 18           test   %eax,0x18(%esp,%edx,4)
75: 75 39                 jne    b0

All these instructions implements the if statement! Again, we have a load from memory (5f), but checking which bit is set requires much more work. The input (edx) is split to the lower part — i.e. bit number (67, 6c) and the higher part — i.e. word index (6c). The last step is to check if a bit is set in a word — GCC used variable shift left (6f), but x86 has BT instruction, so in the perfect code we would have two instructions less.

However, as we see a simple access in the bitset is much more complicated than simple memory fetch from byteset. For small sets memory fetches are well cached and smaller number of instruction improves performance. For really large sets cache misses would kill performance, then bitset is much better choice.