Not everything in AVX2 is 256-bit

Author:	Wojciech Muła
Added on:	2015-03-21

AVX2 has added support for 256-bit arguments for many operations on packed integers, although not all. Some instructions accept the 256-bit registers, but operates on 128-bit lanes rather the whole register.

There are three major groups of instructions:

packing (narrowing conversion),
unpacking (interleave),
and permutations

Below is a full list of instructions (with intrinsics):

valignr (_mm256_alignr_epi8)
vpslldq (_mm256_bslli_epi128)
vpsrldq (_mm256_bsrli_epi128)
vmpsadbw (_mm256_mpsadbw_epu8)
vpacksswb (_mm256_packs_epi16)
vpackssdw (_mm256_packs_epi32)
vpackuswb (_mm256_packus_epi16)
vpackusdw (_mm256_packus_epi32)
vperm2i128 (_mm256_permute2x128_si256)
vpermq (_mm256_permute4x64_epi64)
vpermpd (_mm256_permute4x64_pd)
vpshufd (_mm256_shuffle_epi32)
vpshufb (_mm256_shuffle_epi8)
vpshufhw (_mm256_shufflehi_epi16)
vpshuflw (_mm256_shufflelo_epi16)
vpslldq (_mm256_slli_si256)
vpsrldq (_mm256_srli_si256)
vpunpckhwd (_mm256_unpackhi_epi16)
vpunpckhdq (_mm256_unpackhi_epi32)
vpunpckhqdq (_mm256_unpackhi_epi64)
vpunpckhbw (_mm256_unpackhi_epi8)
vpunpcklwd (_mm256_unpacklo_epi16)
vpunpckldq (_mm256_unpacklo_epi32)
vpunpcklqdq (_mm256_unpacklo_epi64)
vpunpcklbw (_mm256_unpacklo_epi8)

For me the most surprising are packing instructions (vpack*) as they require additional shuffling (after or before the instruction) if we want to keep the order of values. In some cases the order is crucial.