00000000000000D9 mov byte ptr [rcx+1Fh],dl 00000000000000DC mov byte ptr [rdx+rcx],0 00000000000000E0 movzx edx,byte ptr [rcx+1Fh] 00000000000000E4 movzx eax,dl
First dl is stored in memory. It's ok. Then edx is used as an offset. Also ok. Seems that compiler knows that highest bits of edx are zeros i.e. edx = dl. Not so fast - edx is reloaded with it's original value and then eax is populated with the same value. These two movzx could be replaced with single mov eax, edx.
And another:
00000000000000BF movaps xmm6,xmmword ptr [foo] 00000000000000C4 movdqa xmmword ptr [foo],xmm6
Yes, load & store the same value. Completely useless! Moreover, xmm6 isn't used in following code. (It's worth noting that load is done by an FP-unit and store by integer unit, inter-unit transfers cost one additional cycle on some processor.)
Above instruction sequences were produced by the newest compiler from Microsoft (MSVC 2013, update 3).