if x > const_n then mask := 0xffffffff; else mask := 0x00000000;
Portable branchless solution:
- choose magic number M := (1 << (k-1)) - 1 - n, where k is a bit position, for example 31 if we operate on 32-bit words
- calculate R := x + M
- k-th bit of R is set if x > n
- fill mask with this bit - see note Fill word with selected bit
The key to understand this trick is binary form of M: 0111..1111zzzz, where z is 0 or 1 depending on n value. When x is greater then n, then x + M has form 1000..000zzzz, because carry bit propagate through series of ones to k-th position of result.
Real world example - branchless converting hex digit to ASCII (M=0x7ffffff6 for k=31 and n=9).
; input: eax - hex digit ; output: eax - ASCII letter (0-9, A-F or a-f) ; destroys: ebx andl 0xf, %eax leal 0x7ffffff6(%eax), %ebx ; MSB(ebx)=1 when eax >= 10 sarl $31, %ebx ; ebx - mask andl $7, %ebx ; ebx = 7 when eax >= 10 (for A-F letters) ;andl $39, %ebx ; ebx = 39 when eax >= 10 (for a-f letters) leal '0'(%eax, %ebx), %eax ; eax = '0' + eax + ebx => ASCII letter
It is also possible to convert 4 hex digits in parallel using similar algorithm, but input data have to be correctly prepared. Moreover generating mask requires 3 instructon and one extra register (in scalar version just one arithmetic shift). I guess it wont be fast on x86, maybe this approach would be good for SIMD code, where similar code transforms more bytes at once.
; input: eax - four hex digits in form [0a0b0c0d] ; output: eax - four ascii letters ; destroys: ebx, ecx leal 0x76767676(%eax), %ebx ; MSB of each byte is set when corresponding eax byte is >= 10 ; (here: 0x7f - 9 = 0x76) andl $0x80808080, %ebx movl %ebx, %ecx shrl $7, %ebx subl %ebx, %ecx ; ecx - byte-wise mask ;andl $0x07070707, %ecx ; for ASCII letters A-F andl $0x27272727, %ecx ; for ASCII letters a-f leal 0x30303030(%eax, %ecx), %eax ; ecx - four ascii letters
See also: SSSE3: printing hex values (weird use of PSHUFB instruction)
Brak komentarzy:
Prześlij komentarz