Author: | Wojciech Muła |
---|---|
Added on: | 2018-03-14 |
Updated on: | 2018-03-16 |
Contents
The intersection of two sets represented by sorted collections (like lists or arrays) can be done in linear time. If we label with k the size of one collection, and with n the size of another collection, then the complexity of intersection is O(n + k).
Below is a naive C++ implementation; the C++ standard library comes with std::set_intersection.
template <typename INSERTER> void custom_set_intersection(const vec& A, const vec& B, INSERTER output) { size_t i = 0; size_t j = 0; while (i < A.size() && j < B.size()) { if (A[i] < B[j]) { i += 1; } else if (B[j] < A[i]) { j += 1; } else { // A[i] == B[j] *output++ = A[i]; i += 1; j += 1; } } }
If there is a huge difference between sizes of collections, then we might use two different approaches described below. I use terms "smaller" (k items) and "larger" collection/set (n items).
In this approach we iterate over the smaller collection. For each item the binary search on the larger collection is performed. The complexity of this solution is O(k ⋅ logn).
A clear disadvantage is that at least the larger collection must support random access. It must be an array, a vector or something similar.
To make a practical implementation faster, we utilize the fact that both collections are sorted. In each iteration we are not looking for an equal element but for the first element which is greater or equal to the searched one. Thanks to that the searched range is narrowed after each iteration: the part of the larger collection containing elements smaller than the elements already tested is never touched again. It doesn't change the asymptotic complexity, though.
In C++ the function std::lower_bound implements this look up; it's a modification of the binary search algorithm.
Below is a C++ implementation.
template <typename INSERTER> void binsearch_set_intersection(const vec& A, const vec& B, INSERTER output) { auto it = B.begin(); for (const auto& a: A) { it = std::lower_bound(it, B.end(), a); if (it == B.end()) { // since there are no values greater or equal the current, // there also won't be any for subsequent value (all greater than current) return; } if (*it == a) { output = a; ++it; } } }
Lets assume that the number of operations in the linear algorithm is exactly n + k. Likewise, for binary search we may assume that on average there are k ⋅ log2n operations. It is of course a simplification, because we are unable to derive the exact numbers of operations, as they depend on input values and their distribution.
We start with an obvious inequality:
k + n < k ⋅ log2(n)
n < k(log2(n) − 1)
n/k < log2(n) − 1
If the proportion n/k is smaller than log2(n) − 1 then the linear algorithm will perform better. After a simple transformation we get the threshold for size of the smaller set:
k > n/(log2(n) − 1)
For example, if n = 1.000.000 then k≈ 50.000; the proportion of sizes is 18-19. In practise, we may approximate function log2 with the position of the first bit set in the number k.
A SIMD approach is vectorized linear algorithm in which a single value from the smaller set is compared with vectors read from the larger set. Thanks to that the number of comparisons is smaller and we can faster skip parts of the larger collection.
In the example below we assume that the first element of the smaller collection is 9, and the head of the larger collection contains values [-5, 1, 2, 5, 7, 8, 9, 12].
Fill the vector A with the first item from the smaller collection:
+-----+-----+-----+-----+ | 9 | 9 | 9 | 9 | +-----+-----+-----+-----+
Load into another vector B a part of the larger collection:
+-----+-----+-----+-----+ | -5 | 1 | 2 | 5 | +-----+-----+-----+-----+
Compare B < A:
+-----+-----+-----+-----+ | -1 | -1 | -1 | -1 | +-----+-----+-----+-----+
If all items in B are smaller (all bits of the comparison vector are set), then load the next chunk into B:
+-----+-----+-----+-----+ | 7 | 8 | 9 | 12 | +-----+-----+-----+-----+
And keep comparing with A; now B < A vector is equal:
+-----+-----+-----+-----+ | -1 | -1 | 0 | 0 | +-----+-----+-----+-----+
The result vector contains some zero items. It means that B may contain the value present in A. If it's true (in the example it is), save this value in the output collection. In either case fill A with the next value from the smaller collection and again compare with B.
Below is a sample implementation that uses SSE instructions.
template <typename INSERTER> void sse_set_intersection(const vec& A, const vec& B, INSERTER output) { __m128i a_rep; __m128i b; size_t ai = 0; size_t bi = 0; a_rep = _mm_set1_epi32(A[ai]); b = _mm_loadu_si128(reinterpret_cast<const __m128i*>(&B[bi])); while (ai < A.size() && bi < B.size()) { const __m128i lt = _mm_cmplt_epi32(b, a_rep); const uint16_t mask = _mm_movemask_epi8(lt); if (mask == 0xffff) { // all elements in b are smaller, fetch the next chunk from B bi += 4; b = _mm_loadu_si128(reinterpret_cast<const __m128i*>(&B[bi])); } else { // there might be element equal to A[ai] // a simple linear search, as there're only 4 elements to search in for (size_t i = 0; i < 4; i++) { if (B[bi + i] == A[ai]) { output = A[ai]; break; } } // fetch the next value from A ai += 1; a_rep = _mm_set1_epi32(A[ai]); } } }
Experiment:
Full source code is available.
size A | size B | size ratio | std [us] | SSE [us] | binsearch [us] |
---|---|---|---|---|---|
128 | 1048576 | 8192.00 | 1802 | 666 | 16 |
x 2.71 | x 112.62 | ||||
256 | 1048576 | 4096.00 | 1803 | 670 | 37 |
x 2.69 | x 48.73 | ||||
384 | 1048576 | 2730.67 | 1807 | 672 | 53 |
x 2.69 | x 34.09 | ||||
512 | 1048576 | 2048.00 | 1807 | 674 | 69 |
x 2.68 | x 26.19 | ||||
640 | 1048576 | 1638.40 | 1816 | 684 | 84 |
x 2.65 | x 21.62 | ||||
768 | 1048576 | 1365.33 | 1817 | 685 | 97 |
x 2.65 | x 18.73 | ||||
896 | 1048576 | 1170.29 | 1820 | 690 | 112 |
x 2.64 | x 16.25 | ||||
1024 | 1048576 | 1024.00 | 1823 | 698 | 127 |
x 2.61 | x 14.35 | ||||
1152 | 1048576 | 910.22 | 1824 | 695 | 146 |
x 2.62 | x 12.49 | ||||
1280 | 1048576 | 819.20 | 1825 | 705 | 159 |
x 2.59 | x 11.48 | ||||
2560 | 1048576 | 409.60 | 1844 | 738 | 433 |
x 2.50 | x 4.26 | ||||
6400 | 1048576 | 163.84 | 1889 | 819 | 1009 |
x 2.31 | x 1.87 | ||||
2048 | 1048576 | 512.00 | 1837 | 724 | 319 |
x 2.54 | x 5.76 | ||||
3072 | 1048576 | 341.33 | 1851 | 750 | 549 |
x 2.47 | x 3.37 | ||||
4096 | 1048576 | 256.00 | 1863 | 770 | 719 |
x 2.42 | x 2.59 | ||||
5120 | 1048576 | 204.80 | 1875 | 796 | 863 |
x 2.36 | x 2.17 | ||||
6144 | 1048576 | 170.67 | 1888 | 817 | 982 |
x 2.31 | x 1.92 | ||||
7168 | 1048576 | 146.29 | 1900 | 836 | 1092 |
x 2.27 | x 1.74 | ||||
8192 | 1048576 | 128.00 | 1913 | 857 | 1180 |
x 2.23 | x 1.62 | ||||
9216 | 1048576 | 113.78 | 1927 | 876 | 1274 |
x 2.20 | x 1.51 | ||||
10240 | 1048576 | 102.40 | 1938 | 893 | 1349 |
x 2.17 | x 1.44 | ||||
20480 | 1048576 | 51.20 | 2059 | 1100 | 1892 |
x 1.87 | x 1.09 | ||||
51200 | 1048576 | 20.48 | 2388 | 1741 | 2878 |
x 1.37 | x 0.83 |
size A | size B | size ratio | std [us] | SSE [us] | AVX2 [us] | binsearch [us] |
---|---|---|---|---|---|---|
128 | 1048576 | 8192.00 | 771 | 313 | 190 | 2 |
x 2.46 | x 4.06 | x 385.50 | ||||
256 | 1048576 | 4096.00 | 772 | 316 | 192 | 8 |
x 2.44 | x 4.02 | x 96.50 | ||||
384 | 1048576 | 2730.67 | 774 | 318 | 193 | 24 |
x 2.43 | x 4.01 | x 32.25 | ||||
512 | 1048576 | 2048.00 | 775 | 320 | 195 | 38 |
x 2.42 | x 3.97 | x 20.39 | ||||
640 | 1048576 | 1638.40 | 780 | 324 | 197 | 50 |
x 2.41 | x 3.96 | x 15.60 | ||||
768 | 1048576 | 1365.33 | 780 | 326 | 199 | 59 |
x 2.39 | x 3.92 | x 13.22 | ||||
896 | 1048576 | 1170.29 | 782 | 329 | 201 | 70 |
x 2.38 | x 3.89 | x 11.17 | ||||
1024 | 1048576 | 1024.00 | 784 | 331 | 203 | 81 |
x 2.37 | x 3.86 | x 9.68 | ||||
1152 | 1048576 | 910.22 | 785 | 334 | 204 | 90 |
x 2.35 | x 3.85 | x 8.72 | ||||
1280 | 1048576 | 819.20 | 786 | 336 | 206 | 98 |
x 2.34 | x 3.82 | x 8.02 | ||||
2560 | 1048576 | 409.60 | 801 | 361 | 224 | 178 |
x 2.22 | x 3.58 | x 4.50 | ||||
6400 | 1048576 | 163.84 | 844 | 438 | 274 | 364 |
x 1.93 | x 3.08 | x 2.32 | ||||
2048 | 1048576 | 512.00 | 796 | 351 | 217 | 148 |
x 2.27 | x 3.67 | x 5.38 | ||||
3072 | 1048576 | 341.33 | 807 | 372 | 230 | 206 |
x 2.17 | x 3.51 | x 3.92 | ||||
4096 | 1048576 | 256.00 | 818 | 391 | 244 | 258 |
x 2.09 | x 3.35 | x 3.17 | ||||
5120 | 1048576 | 204.80 | 830 | 412 | 259 | 305 |
x 2.01 | x 3.20 | x 2.72 | ||||
6144 | 1048576 | 170.67 | 841 | 431 | 271 | 351 |
x 1.95 | x 3.10 | x 2.40 | ||||
7168 | 1048576 | 146.29 | 853 | 451 | 285 | 396 |
x 1.89 | x 2.99 | x 2.15 | ||||
8192 | 1048576 | 128.00 | 864 | 471 | 299 | 438 |
x 1.83 | x 2.89 | x 1.97 | ||||
9216 | 1048576 | 113.78 | 875 | 490 | 312 | 480 |
x 1.79 | x 2.80 | x 1.82 | ||||
10240 | 1048576 | 102.40 | 886 | 510 | 325 | 520 |
x 1.74 | x 2.73 | x 1.70 | ||||
20480 | 1048576 | 51.20 | 995 | 704 | 459 | 880 |
x 1.41 | x 2.17 | x 1.13 | ||||
51200 | 1048576 | 20.48 | 1302 | 1292 | 881 | 1811 |
x 1.01 | x 1.48 | x 0.72 |