## czwartek, 1 kwietnia 2010

### Branchless signum

Problem: calculate value of sign(x):
• -1 when x < 0
• 0 when x = 0,
• +1 when x > 0.
My solution do not involve any hardware specific things like ALU flags nor special instructions - just plain AND, OR, shifts.
``` ; input: eax = X

movl %eax, %ebx
sarl \$31, %eax  // eax = -1 if X less then zero, 0 otherwise

andl \$0x7fffffff, %ebx
addl \$0x7fffffff, %ebx // MSB is set if any lower bits were set
shrl \$31, \$ebx  // eax = +1 if X greater then zero, 0 otherwise

orl %ebx, %eax  // eax = result
```

C99 implementation:
```int32_t sign(int32_t x) {
int32_t y;
y = (x & 0x7fffffff) + 0x7fffffff;
return (x >> 31) | ((uint32_t)y >> 31);
}
```

### Fill word with selected bit

This is continuation of subproblem from previous post: we have a word (byte, dword, whatever) and want to fill it with selected bit.

1. The most general algorithm:
 => 
2. clone word
a=, b=
3. shift bit in first word to MSB, and to LSB in second word
a=, b=
4. subtract c = a - b
c=
5. add missing MSB c = c OR a
c=
2. If arithmetic shifts are supported:
1. shift bit to MSB
a=
2. arithmetic shift right
a=
3. On processor 386+ we can copy selected bit from reg to carry flag (CF) with bt reg, reg and then clone CF with sbb reg, reg.
```#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>

uint32_t fill1(uint32_t a, int bit) {
uint32_t b;

b = a = a & (1 << bit);

a <<= 31 - bit;
b >>= bit;

return (a - b) | a;
}

uint32_t fill2(uint32_t a, int bit) {
a <<= 31 - bit;
return (int32_t)(a) >> 31;
}

uint32_t fill386(uint32_t a, int bit) {
uint32_t result;
__asm__ __volatile__ (
"bt  %1, %0\n"
"sbb %0, %0\n"
: "=r" (result)
: "r" (bit), "0" (a)
);
return result;
}

int main(int argc, char* argv[]) {
uint32_t x, i;
if (argc > 1)
x = (unsigned long)strtol(argv, NULL, 0);

printf("input = %08x\n", x);
for (i=0; i < 32; i++)
printf("bit %2d: fill1 = %08x, fill2 = %08x, fill386 = %08x\n",
i,
fill1(x, i),
fill2(x, i),
fill386(x, i)
);

return EXIT_SUCCESS;
}
```