Let see what sha1msg1 xmm1, xmm2 does (type of arguments is packed dword):
result[0] := xmm1[0] xor xmm1[2] result[1] := xmm1[1] xor xmm1[3] result[2] := xmm1[2] xor xmm2[0] result[3] := xmm1[3] xor xmm2[1]
- Logical operation "xor" is hardcoded. Why we can't use "or", "and", "not and"? These operations are already present in ISA.
- Indices to xmm1 and xmm2 are hardcoded too. Instruction pshufd accepts immediate argument (1 byte) to select permutation, why sha1msg1 couldn't be feed with 2 bytes allowing programmer to select any permutations of arguments?
- Sources of operators are also hardcoded. Why not use another immediate (1 byte) to select sources, for example 00b = xmm1/xmm1, 01b = xmm1/xmm2, 10b = xmm2/xmm1, 11b = xmm2/xmm2.
for i := 0 to 3 do arg1_indice := imm_1[2*i:2*i + 1] arg2_indice := imm_2[2*i:2*i + 1] if imm_3[2*i] = 1 then arg1 := xmm1 else arg1 := xmm2 end if if imm_3[2*i + 1] = 1 then arg2 := xmm2 else arg2 := xmm1 end if result[i] := arg1[arg1_indice] op arg2[arg2_indice] end for
Then sha1msg1 is just a special case:
generic_xor xmm1, xmm2, 0b11100100, 0b01001110, 0b01010000
Maybe this example is "too generic", too complex, and would be hard to express in hardware. I just wanted to show that we will get shine new instructions useful in few cases. Compilers can vectorize loops and make use of SSE, but SHA is used in drivers, OS and is encapsulated in libraries --- sha1msg1 and friends will never appear in ordinary programs.