Author: | Wojciech Muła |
---|---|
Added on: | 2016-09-14 |
There is a SIMD register (128-, 256-, 512-bit width), we want to set all bits above the given position k; k is in range from 0 to the register's width.
Of course a lookup table could be used, but it's not a intersting (maybe a little.)
Treat the register as a set of chunks, where a chunk could be a word, a double word etc. Let chunk_size is the number of bits in a chunk, then n = k / chunk_size.
All chunks above n have to be filled, all below cleared. The only exception is n-th chunk which must be filled partially.
Algorithm:
const size_t chunk_size = 32; const size_t n = k / chunk_size; // n = 2 const size_t shift = k % chunk_size; // shift = 7 const __m256i chunk_numbers = _mm256_setr_epi32(0, 1, 2, 3, 4, 5, 6, 7); const __m256i chunk = _mm256_set1_epi32(n);
// 7 6 5 4 3 2 1 0 // tmp1 = [0xffffffff|0xffffffff|0xffffffff|0xffffffff|0xffffffff|0x00000000|0x00000000|0x00000000] const __m256i tmp1 = _mm256_cmpgt_epi32(chunk_numbers, chunk);
// tmp2 = [0x00000000|0x00000000|0x00000000|0x00000000|0x00000000|0xffffffff|0x00000000|0x00000000] const __m256i tmp2 = _mm256_cmpeq_epi32(chunk_numbers, chunk); // tmp2[2] = 0b11111111_11111111_11111111_10000000 = 0xffffff80 const __m256i tmp3 = _mm256_slli_epi32(tmp2, shift);
// result = [0xffffffff|0xffffffff|0xffffffff|0xffffffff|0xffffffff|0xffffff80|0x00000000|0x00000000] const __m256i result = _mm256_or_si256(tmp1, tmp3);
Github repository contains an example program with tests.