Boolean function for the rescue

Author:Wojciech Muła
Added on:2015-10-25

The problem is defined as follows: a set of features is saved using bit-sets (usually large), and there is a list/map/whatever of sets containing features of different objects. We have to find which features are unique.

Naive solution is to use nested loops:

bit_set find_unique(list) {
    unsigned size = FEATURES_COUNT;
    bit_set  result(size);

    for (auto i=0; i < size; i++) {

        unsigned count = 0;
        for (auto& set: list) {
            if (set.is_set(i)) {
                count += 1;
            }
        }

        if (count == 1) {
            result.set(i);
        }
    }

    return result;
}

Not good, the method is_set of bit-set is called size * list.size() times. Even if a compiler is able to inline the call and use simple bit tests instructions it's still too expansive. Bit-set implementations always use arrays of integers to store the data, thanks to that bit-operations (and, or, xor, etc.) are executed very fast. We try to exploit this with boolean functions.

Each feature could be described as:

Now these states are encoded using two bits:

H L value
0 0 non-existing
0 1 unique
1 1 non-unique

Then we define a transition table. For example if a feature is present and the current value is 'unique' then the next value is 'non-unique' (row 5th).

H L feature H' L'
0 0 0 0 0
0 1 0 0 1
1 1 0 1 1
0 0 1 0 1
0 1 1 1 1
1 1 1 1 1

Boolean expressions are: L' = L or feature; H' = H or (feature and L).

We also need to get single bit-set from H and L at the end:

H L result
0 0 0
0 1 1
1 1 0

The final boolean expression is: result = L and not H.

Now we can rewrite the code:

class unique_checker {

    unsigned  size;
    bit_set   L, H;

public:
    unique_checker(unsigned size)
        : size(size)
        , L(size)
        , H(size) {}

    void update(const bit_set& set) {

        assert(set.size() == size);

        H = H | (set & L);  // we suppose that bit_set overloads bit-operator
        L = L | set;
    }

    bit_set finalize() const {

        return L | ~H;
    }
};

bit_set find_unique2(list) {

    unique_checker checker(FEATURES_COUNT);

    for (auto& set: list) {
        checker.update(set);
    }

    return checker.finalize();
}

I really like this approach.