Notes on computers, programming and all the stuff: Transpose bits in byte using SIMD instructions

środa, 31 marca 2010

Transpose bits in byte using SIMD instructions

Method presented here allows to get any bit permutation, transposition is just one of possible operations. Lookup-based approach would be faster, but algorithm is worth to (re)show.

Algorithm outline for 8-byte vector (with SSE instruction it is possible to get 2 operations in parallel):

fill vector with given byte
[11010001] =>
[11010001|11010001|11010001|11010001|11010001|11010001|11010001|11010001]
leave one bit per byte
[10000000|01000000|00000000|00010000|00000000|00000000|00000000|00000001]
perform desired transposition ("move" bits around)
[00000001|000000010|00000000|00001000|00000000|00000000|00000000|10000000]
perform horizontal OR of all bytes
[10001011]

Here is my old MMX code (polish text); below SSE/SSE5 implementation details.

Ad 1. Series of punpcklbw/punpcklwb/shufps or pshufb if CPU supports SSSE3.

# 1.1
movd      %eax, %xmm0
shufps    $0x00, %xmm0, %xmm0
punpcklbw %xmm0, %xmm0
punpcklwd %xmm0, %xmm0

# 1.2
pxor      %xmm1, %xmm1
movd       %eax, %xmm0
pshufb    %xmm1, %xmm0

Ad 2. Simple pand with mask packed_qword(0x8040201008040201).

pand  MASK1, %xmm0

Ad 3. If plain SSE instructions are supported this step require some work. First each bit is populated to fill whole byte (using pcmpeq - we get negated result), then mask bits on desired positons.

SSE5 has powerful instruction protb that can do perform rotation of each byte with independent amount - so in this case just one instruction is needed.

# SSE
pcmpeq  %xmm1, %xmm0
pandn   MASK2, %xmm0

# SSE5
protb    ROT, %xmm0, %xmm0

Ad 4. Since bits are placed on distinct positions, we can use instruction psadbw, that calculate horizontal sums of bytewide differences from two registers (separately for low and high registers half). If one register is full of zeros, we get sum of bytes from other register.

psadbw  %xmm1, %xmm0
movd    %xmm0, %eax

Depending on instruction set three (SSE) or two (SSE5) additional tables are needed.

Brak komentarzy:

Prześlij komentarz

Notes on computers, programming and all the stuff

środa, 31 marca 2010

Transpose bits in byte using SIMD instructions

Brak komentarzy:

O mnie

Archiwum bloga

Etykiety

Bookmarks

Photos