Notes on computers, programming and all the stuff: Not everything in AVX2 is 256-bit

AVX2 has added support for 256-bit arguments for many operations on packed integers, although not all. Some instructions accept the 256-bit registers, but operates on 128-bit lanes rather whole register.

There are three major groups of instructions: packing (narrowing conversion), unpacking (interleave) and permutations; below is a full list of instructions (with intrinsics):

valignr (_mm256_alignr_epi8)
vpslldq (_mm256_bslli_epi128)
vpsrldq (_mm256_bsrli_epi128)
vmpsadbw (_mm256_mpsadbw_epu8)
vpacksswb (_mm256_packs_epi16)
vpackssdw (_mm256_packs_epi32)
vpackuswb (_mm256_packus_epi16)
vpackusdw (_mm256_packus_epi32)
vperm2i128 (_mm256_permute2x128_si256)
vpermq (_mm256_permute4x64_epi64)
vpermpd (_mm256_permute4x64_pd)
vpshufd (_mm256_shuffle_epi32)
vpshufb (_mm256_shuffle_epi8)
vpshufhw (_mm256_shufflehi_epi16)
vpshuflw (_mm256_shufflelo_epi16)
vpslldq (_mm256_slli_si256)
vpsrldq (_mm256_srli_si256)
vpunpckhwd (_mm256_unpackhi_epi16)
vpunpckhdq (_mm256_unpackhi_epi32)
vpunpckhqdq (_mm256_unpackhi_epi64)
vpunpckhbw (_mm256_unpackhi_epi8)
vpunpcklwd (_mm256_unpacklo_epi16)
vpunpckldq (_mm256_unpacklo_epi32)
vpunpcklqdq (_mm256_unpacklo_epi64)
vpunpcklbw (_mm256_unpacklo_epi8)

For me the most surprising are packing instructions (vpack*) as they require additional shuffling (after or before the instruction) if we want to keep order of values. In some cases the order is crucial.

Notes on computers, programming and all the stuff

sobota, 21 marca 2015

Not everything in AVX2 is 256-bit

Brak komentarzy:

Prześlij komentarz