The problem: there are 64-bit values with some data bits and some metadata bits; metadata includes a k-bit field describing a "type" (k >= 0). Type field is located in a lower 32-bits.
Procedure processes two "types", one denoted with code 3 and another with 5. When all items are of type 3 then we can use a fast AVX2 path, if there are some types 5, we have to call an additional function (a virtual method, to be precise). Read more ...
Pokazywanie postów oznaczonych etykietą avx. Pokaż wszystkie posty
Pokazywanie postów oznaczonych etykietą avx. Pokaż wszystkie posty
niedziela, 22 marca 2015
AVX512: ternary functions evaluation
Intel's version of SIMD offers following 2-argument (binary) boolean functions: and, or, xor, and not. There isn't a single argument not, this function can be expressed with xor reg, ones, however this require additional, pre-set register.
AVX512F will come with very interesting instruction called vpternlog. Read more ...
AVX512F will come with very interesting instruction called vpternlog. Read more ...
sobota, 21 marca 2015
Not everything in AVX2 is 256-bit
AVX2 has added support for 256-bit arguments for many operations on packed integers, although not all. Some instructions accept the 256-bit registers, but operates on 128-bit lanes rather whole register.
There are three major groups of instructions: packing (narrowing conversion), unpacking (interleave) and permutations; below is a full list of instructions (with intrinsics):
There are three major groups of instructions: packing (narrowing conversion), unpacking (interleave) and permutations; below is a full list of instructions (with intrinsics):
- valignr (_mm256_alignr_epi8)
- vpslldq (_mm256_bslli_epi128)
- vpsrldq (_mm256_bsrli_epi128)
- vmpsadbw (_mm256_mpsadbw_epu8)
- vpacksswb (_mm256_packs_epi16)
- vpackssdw (_mm256_packs_epi32)
- vpackuswb (_mm256_packus_epi16)
- vpackusdw (_mm256_packus_epi32)
- vperm2i128 (_mm256_permute2x128_si256)
- vpermq (_mm256_permute4x64_epi64)
- vpermpd (_mm256_permute4x64_pd)
- vpshufd (_mm256_shuffle_epi32)
- vpshufb (_mm256_shuffle_epi8)
- vpshufhw (_mm256_shufflehi_epi16)
- vpshuflw (_mm256_shufflelo_epi16)
- vpslldq (_mm256_slli_si256)
- vpsrldq (_mm256_srli_si256)
- vpunpckhwd (_mm256_unpackhi_epi16)
- vpunpckhdq (_mm256_unpackhi_epi32)
- vpunpckhqdq (_mm256_unpackhi_epi64)
- vpunpckhbw (_mm256_unpackhi_epi8)
- vpunpcklwd (_mm256_unpacklo_epi16)
- vpunpckldq (_mm256_unpacklo_epi32)
- vpunpcklqdq (_mm256_unpacklo_epi64)
- vpunpcklbw (_mm256_unpacklo_epi8)
Subskrybuj:
Posty (Atom)