We show that the placement policies of dynamic storage allocators -- such as those found in common "malloc" implementations -- can influence the L1 conflict miss rate in the L1. Conflict misses -- sometimes called mapping misses -- arise because of less than ideal associativity and represent imbalanced distribution of active memory blocks over the set of available L1 indices. Under transactional execution conflict misses may manifest as aborts, representing wasted or futile effort instead of a simple stall as would occur in normal execution mode.
wtorek, 21 kwietnia 2015
The Influence of Malloc Placement on TSX Hardware Transactional Memory
Interesting paper:
niedziela, 19 kwietnia 2015
Conversion numbers to binary ASCII representation - new method
Recently I've checked different methods to convert numbers to binary representation, including use of new PDEP instruction from BMI2 extension.
Today I've updated the article with new SWAR version 2, a tricky use of multiplication. The method is not faster, but I like the approach---in certain conditions multiplication can be seen as multi-shift/bit-or instruction. I've already use multiplication in this way to emulate instruction pmovmskb.
Today I've updated the article with new SWAR version 2, a tricky use of multiplication. The method is not faster, but I like the approach---in certain conditions multiplication can be seen as multi-shift/bit-or instruction. I've already use multiplication in this way to emulate instruction pmovmskb.
poniedziałek, 13 kwietnia 2015
Speeding up bit-parallel population count
Nearly 50% faster than naive version for large data sets. Discovered by accident. :)
czwartek, 9 kwietnia 2015
Github repositories
I've put source code for my two articles at github:
BTW the article about popcount has gained popularity, and I hope another crazy idea about hacking MPSADBW will spread all over the world.
- SSSE3: fast popcount --- repository
- SSE4 string search - modification of Karp-Rabin algorithm --- repository
BTW the article about popcount has gained popularity, and I hope another crazy idea about hacking MPSADBW will spread all over the world.
SIMD-ized searching in unique constant dictionary
The problem: there is a ordered dictionary containing only
unique keys. Dictionary is read only, and keys are 32-bit (SSE) or
64-bit (AVX2). Read more