niedziela, 30 listopada 2014

Conversion int and double to string - comparison

Milo Yip has compared different itoa and dtoa implementations on Core i7, including my itoa algorithm 2, that use SSE2 instructions.

Results for itoa are interesting: SSE2 version is not as good as it seemed to be. Tricky branchlut algorithm is only 10% slower, moreover is perfectly portable. One obvious drawback of this method is using lookup-table - in real environment where is a big pressure on cache, memory access could be a bottleneck.

sobota, 22 listopada 2014

Simple Testing Can Prevent Most Critical Failures

I recommend very interesting paper. Authors studied many errors in complicated distributed systems, like Cassandra, and found that majority of failures are caused by trivial errors (some of them can be detected even in unit tests). Here is very interesting quote:

almost all (92%) of the catastrophic system failures are the result of incorrect handling of non-fatal errors explicitly signaled in software.

In my opinion causes of errors spotted in the study may apply to any kind of software.

niedziela, 16 listopada 2014

Speeding up searching in linked list

Sounds crazy, but it's possible in some cases. Here are experiments results - 3 times faster isn't so bad.

list                          :      0.780s, speedup  1.00
array list (4)                :      0.703s, speedup  1.11
array list (8)                :      0.515s, speedup  1.51
SIMD array list (4)           :      0.365s, speedup  2.14
SIMD array list (8)           :      0.258s, speedup  3.03

piątek, 14 listopada 2014

MSVC 2013 Release code

Today I had a quite long session with debugger and release code, you know: no debugger symbols and optimized code. I spotted two pieces of assembly that forced me to check if compilation mode was really set to release.

00000000000000D9  mov         byte ptr [rcx+1Fh],dl 
00000000000000DC  mov         byte ptr [rdx+rcx],0 
00000000000000E0  movzx       edx,byte ptr [rcx+1Fh] 
00000000000000E4  movzx       eax,dl 

First dl is stored in memory. It's ok. Then edx is used as an offset. Also ok. Seems that compiler knows that highest bits of edx are zeros i.e. edx = dl. Not so fast - edx is reloaded with it's original value and then eax is populated with the same value. These two movzx could be replaced with single mov eax, edx.

And another:

00000000000000BF  movaps      xmm6,xmmword ptr [foo] 
00000000000000C4  movdqa      xmmword ptr [foo],xmm6 

Yes, load & store the same value. Completely useless! Moreover, xmm6 isn't used in following code. (It's worth noting that load is done by an FP-unit and store by integer unit, inter-unit transfers cost one additional cycle on some processor.)

Above instruction sequences were produced by the newest compiler from Microsoft (MSVC 2013, update 3).