The problem: there are 64-bit values with some data bits and some metadata bits; metadata includes a k-bit field describing a "type" (k >= 0). Type field is located in a lower 32-bits.
Procedure processes two "types", one denoted with code 3 and another with 5. When all items are of type 3 then we can use a fast AVX2 path, if there are some types 5, we have to call an additional function (a virtual method, to be precise). Read more ...
Thank you for these enlightening SIMD posts! A few errata:
OdpowiedzUsuń> auto A_type = convert(A_lo_type, A_hi_type); // PACKSSQD
PACKSSQD does not exist
> Version 2:
> auto any_5 = A_type + B_type + packed_dword(0x7fffff80) // PADDD x 2
0x7fffff80 -> 0x7fffffc0
> MOVMSK_PS
MOVMSKPS
> PSLRD
PSLLD