Not everything in AVX2 is 256-bit
Author: | Wojciech Muła |
Added on: | 2015-03-21 |
AVX2 has added support for 256-bit arguments for many operations on
packed integers, although not all. Some instructions accept the 256-bit
registers, but operates on 128-bit lanes rather the whole register.
There are three major groups of instructions:
- packing (narrowing conversion),
- unpacking (interleave),
- and permutations
Below is a full list of instructions (with intrinsics):
- valignr (_mm256_alignr_epi8)
- vpslldq (_mm256_bslli_epi128)
- vpsrldq (_mm256_bsrli_epi128)
- vmpsadbw (_mm256_mpsadbw_epu8)
- vpacksswb (_mm256_packs_epi16)
- vpackssdw (_mm256_packs_epi32)
- vpackuswb (_mm256_packus_epi16)
- vpackusdw (_mm256_packus_epi32)
- vperm2i128 (_mm256_permute2x128_si256)
- vpermq (_mm256_permute4x64_epi64)
- vpermpd (_mm256_permute4x64_pd)
- vpshufd (_mm256_shuffle_epi32)
- vpshufb (_mm256_shuffle_epi8)
- vpshufhw (_mm256_shufflehi_epi16)
- vpshuflw (_mm256_shufflelo_epi16)
- vpslldq (_mm256_slli_si256)
- vpsrldq (_mm256_srli_si256)
- vpunpckhwd (_mm256_unpackhi_epi16)
- vpunpckhdq (_mm256_unpackhi_epi32)
- vpunpckhqdq (_mm256_unpackhi_epi64)
- vpunpckhbw (_mm256_unpackhi_epi8)
- vpunpcklwd (_mm256_unpacklo_epi16)
- vpunpckldq (_mm256_unpacklo_epi32)
- vpunpcklqdq (_mm256_unpacklo_epi64)
- vpunpcklbw (_mm256_unpacklo_epi8)
For me the most surprising are packing instructions (vpack*) as they
require additional shuffling (after or before the instruction) if we
want to keep the order of values. In some cases the order is crucial.