czwartek, 12 grudnia 2013

Extensions to x86 ISA are useless

Intel announced new extension to SSE: instructions accelerating calculating hashes SHA-1 and SHA256. As everything else added recently to x86 ISA, these new instructions address special cases of "something". Number of instructions, encoding modes, etc. is increasing, but do not help in general.

Let see what sha1msg1 xmm1, xmm2 does (type of arguments is packed dword):

 result[0] := xmm1[0] xor xmm1[2]
 result[1] := xmm1[1] xor xmm1[3]
 result[2] := xmm1[2] xor xmm2[0]
 result[3] := xmm1[3] xor xmm2[1]

  1. Logical operation "xor" is hardcoded. Why we can't use "or", "and", "not and"? These operations are already present in ISA.
  2. Indices to xmm1 and xmm2 are hardcoded too. Instruction pshufd accepts immediate argument (1 byte) to select permutation, why sha1msg1 couldn't be feed with 2 bytes allowing programmer to select any permutations of arguments?
  3. Sources of operators are also hardcoded. Why not use another immediate (1 byte) to select sources, for example 00b = xmm1/xmm1, 01b = xmm1/xmm2, 10b = xmm2/xmm1, 11b = xmm2/xmm2.
Such generic instruction would be saved as generic_op xmm1, xmm2, imm_1, imm_2, imm_3 and execute following algorithm:

 for i := 0 to 3 do
  arg1_indice := imm_1[2*i:2*i + 1]
  arg2_indice := imm_2[2*i:2*i + 1]

  if imm_3[2*i] = 1 then
   arg1 := xmm1
   arg1 := xmm2
  end if

  if imm_3[2*i + 1] = 1 then
   arg2 := xmm2
   arg2 := xmm1
  end if

  result[i] := arg1[arg1_indice] op arg2[arg2_indice]
 end for
Then sha1msg1 is just a special case:

 generic_xor xmm1, xmm2, 0b11100100, 0b01001110, 0b01010000

Maybe this example is "too generic", too complex, and would be hard to express in hardware. I just wanted to show that we will get shine new instructions useful in few cases. Compilers can vectorize loops and make use of SSE, but SHA is used in drivers, OS and is encapsulated in libraries --- sha1msg1 and friends will never appear in ordinary programs.