Author: | Wojciech Muła |
---|---|
Added on: | 2015-10-25 |
The problem is defined as follows: a set of features is saved using bit-sets (usually large), and there is a list/map/whatever of sets containing features of different objects. We have to find which features are unique.
Naive solution is to use nested loops:
bit_set find_unique(list) { unsigned size = FEATURES_COUNT; bit_set result(size); for (auto i=0; i < size; i++) { unsigned count = 0; for (auto& set: list) { if (set.is_set(i)) { count += 1; } } if (count == 1) { result.set(i); } } return result; }
Not good, the method is_set of bit-set is called size * list.size() times. Even if a compiler is able to inline the call and use simple bit tests instructions it's still too expansive. Bit-set implementations always use arrays of integers to store the data, thanks to that bit-operations (and, or, xor, etc.) are executed very fast. We try to exploit this with boolean functions.
Each feature could be described as:
Now these states are encoded using two bits:
H | L | value |
---|---|---|
0 | 0 | non-existing |
0 | 1 | unique |
1 | 1 | non-unique |
Then we define a transition table. For example if a feature is present and the current value is 'unique' then the next value is 'non-unique' (row 5th).
H | L | feature | H' | L' |
---|---|---|---|---|
0 | 0 | 0 | 0 | 0 |
0 | 1 | 0 | 0 | 1 |
1 | 1 | 0 | 1 | 1 |
0 | 0 | 1 | 0 | 1 |
0 | 1 | 1 | 1 | 1 |
1 | 1 | 1 | 1 | 1 |
Boolean expressions are: L' = L or feature; H' = H or (feature and L).
We also need to get single bit-set from H and L at the end:
H | L | result |
---|---|---|
0 | 0 | 0 |
0 | 1 | 1 |
1 | 1 | 0 |
The final boolean expression is: result = L and not H.
Now we can rewrite the code:
class unique_checker { unsigned size; bit_set L, H; public: unique_checker(unsigned size) : size(size) , L(size) , H(size) {} void update(const bit_set& set) { assert(set.size() == size); H = H | (set & L); // we suppose that bit_set overloads bit-operator L = L | set; } bit_set finalize() const { return L | ~H; } }; bit_set find_unique2(list) { unique_checker checker(FEATURES_COUNT); for (auto& set: list) { checker.update(set); } return checker.finalize(); }
I really like this approach.