wtorek, 29 grudnia 2015
Fast conversion of floating-point values to string
The conversion to string could be 15 times faster than sprintf. Read more...
niedziela, 27 grudnia 2015
Base64 encoding — implementation study
Although base64 encoding is a very basic algorithm, it could be sped up a little (25% sounds good?) Read more...
sobota, 26 grudnia 2015
Benefits from the obsession
Everything has started few years ago when I found John Regher's blog. If
you don't know the blog I highly recommend it. Among other things (I like
the photos!) the author studies bugs in compilers, undefined behaviours and
similar things. The word "overflow" appears quite often in his posts due to
the great number of errors related to improper use of the integer
arithmetic. Well, I don't know when the obsession has exactly started, but
recently I realized that I am alert of all integer operations in my
programs. Read more...
sobota, 28 listopada 2015
Implicit conversion - the enemy
I wrote:
I forget that pad_left signature is string, int, char and the char parameter has a default value. My mistake, without doubts.
This is another example of dark sides of the implicit conversions. C++ converts between characters and integers seamlessly. These two beast are distinct in the nature. Of course characters are represented by the numbers, however it's an implementation detail.
One can say: you made a mistake and now blame the language. No, I blame language's design. I'm afraid that we end up with something like Integer and int to overcome such problems.
Lesson learned: never use default parameters in public API (surprise!)
result += string_utils::pad_left(string, '0');
I forget that pad_left signature is string, int, char and the char parameter has a default value. My mistake, without doubts.
This is another example of dark sides of the implicit conversions. C++ converts between characters and integers seamlessly. These two beast are distinct in the nature. Of course characters are represented by the numbers, however it's an implementation detail.
One can say: you made a mistake and now blame the language. No, I blame language's design. I'm afraid that we end up with something like Integer and int to overcome such problems.
Lesson learned: never use default parameters in public API (surprise!)
niedziela, 22 listopada 2015
Another C++ nasty feature
I'm fond of C++ weirdness, really. This language is full of traps, and it shocks me once in a while.
Let's look at this piece of code, a part of a larger module:
We would expect that in case of an error following line will be reported: "user has entered wrong time: 123 PM". Obvious. But please look closer at the code, do you see any mistake? There is one... dirty... hard to notice. I'll give you a minute.
So, the mistake is lack of comma between expressions *clock_hour and *am_pm_clock. However, the code is valid! It compiles! And it took me a little longer than a minute to understand what happened. Explanation is:
In result method is called with a single parameter of type cont char*.
It's bizarre, it's terrible. A language should help a programmer. In my opinion implicit conversions is the worst feature of C++.
Let's look at this piece of code, a part of a larger module:
void validate_date() { // ... boost::optional<unsigned> clock_hour; boost::optional<unsigned> am_pm_clock; // ... fill these fields if (some sanity check failed) { report_error("user has entered wrong time: %d %s", *clock_hour *am_pm_clock ? "AM" : "PM"); } }
We would expect that in case of an error following line will be reported: "user has entered wrong time: 123 PM". Obvious. But please look closer at the code, do you see any mistake? There is one... dirty... hard to notice. I'll give you a minute.
So, the mistake is lack of comma between expressions *clock_hour and *am_pm_clock. However, the code is valid! It compiles! And it took me a little longer than a minute to understand what happened. Explanation is:
- *clock_hour evaluates to expression of type unsigned;
- then compiler sees * - a multiplication operator;
- so checks if multiplication of unsigned (on the left side) with boost::optional<unsigned> (on the right side) is possible;
- it is, because boost::optional<T> has conversion operator to type T.
((*clock_hour) * unsigned(am_pm_clock)) ? "AM" : "PM"
In result method is called with a single parameter of type cont char*.
It's bizarre, it's terrible. A language should help a programmer. In my opinion implicit conversions is the worst feature of C++.
niedziela, 15 listopada 2015
Short report from code::dive 2015
Few days ago I attended code::dive 2015, an IT conference in Wrocław, Poland. It was a one-day conference with a great number of presentations. There were four stages and five sessions, in total 20 talks. Impressive number! But an attender had to choose his own path of just five lectures. I think the decision was pretty difficult. Sometimes less is better. Read more
środa, 15 lipca 2015
C++ magick
A programmer wrote:
Do you see the mistake? Programmer assumed that expression "Invalid index: " + index evaluates to std::string("Invalid index: <some number>").
In fact type of expression "Invalid index: " is char[15], so char[15] + integer results in --- more or less --- char*. For index in range [0, 15] exception will carry tail of the message; for example when index=10 then it will be "dex: ". But for indexes larger than 15 and less than 0 program likely crash.
This is why I hate C++, the language has many dark corners, stupid conventions, implicit conversion, not to mention UB ("just" 150 UB, if you're curious).
class container; class IndexOutOfBounds { public: IndexOutOfBounds(const std::string& msg); }; void container::remove(int index) { if (index < 0 || index >= size()) { throw new IndexOutOfBounds("Invalid index: " + index); } // the rest of method }
Do you see the mistake? Programmer assumed that expression "Invalid index: " + index evaluates to std::string("Invalid index: <some number>").
In fact type of expression "Invalid index: " is char[15], so char[15] + integer results in --- more or less --- char*. For index in range [0, 15] exception will carry tail of the message; for example when index=10 then it will be "dex: ". But for indexes larger than 15 and less than 0 program likely crash.
This is why I hate C++, the language has many dark corners, stupid conventions, implicit conversion, not to mention UB ("just" 150 UB, if you're curious).
sobota, 20 czerwca 2015
Implementation of BT-trees
Great paper by Lars F. Bonnichsen, Christian W. Probst, Sven Karlsson:
This document presents the full implementation details of BT-trees, a highly efficient ordered map, and an evaluation which compares BT-trees with unordered maps. BT- trees are often much faster than other ordered maps, and have comparable performance to unordered map implementations. However, in benchmarks which favor unordered maps, BT-trees are not faster than the fastest unordered map implementations we know of.
Boolean function for the rescue
The problem is defined as follows: a set of features is saved using bit-sets (usually large), and there is a list/map/whatever of sets containing features of different objects. We have to find which features are unique. Read more...
środa, 10 czerwca 2015
Big progress in verification
Formal verification is not easy task, for example ComCert compiler is able to verify, that optimizations haven't modified semantic of a program. Paper Verified correctness and security of OpenSSL HMAC describes verification of the whole "stack":
We have proved, with machine-checked proofs in Coq, that an OpenSSL implementation of HMAC with SHA-256 correctly implements its FIPS functional specification and that its functional specification guarantees the expected cryptographic properties. This is the first machine-checked cryptographic proof that combines a source-program implementation proof, a compiler-correctness proof, and a cryptographic-security proof, with no gaps at the specification interfaces.
piątek, 22 maja 2015
Fast exact summation using small and large superaccumulators
Interesting article by Radford M. Neal:
I present two new methods for exactly summing a set of floating-point numbers, and then correctly rounding to the nearest floating-point number. Higher accuracy than simple summation (rounding after each addition) is important in many applications, such as finding the sample mean of data.
środa, 20 maja 2015
Optimizing Dijkstra for real-world performance
Another interesting paper:
Our experimental results currently put our prototype implementation at about twice as fast as the Boost implementation of the algorithm on both real-world and generated large graphs. Furthermore, this preliminary implementation was written in only a few weeks, by a single programmer. The fact that such an early prototype compares favorably against Boost, a well-known open source library developed by expert programmers, gives us reason to believe our design for the queue is indeed better suited to the problem at hand, and the favorable time measurements are not a product of any specific implementation technique we employed.
wtorek, 21 kwietnia 2015
The Influence of Malloc Placement on TSX Hardware Transactional Memory
Interesting paper:
We show that the placement policies of dynamic storage allocators -- such as those found in common "malloc" implementations -- can influence the L1 conflict miss rate in the L1. Conflict misses -- sometimes called mapping misses -- arise because of less than ideal associativity and represent imbalanced distribution of active memory blocks over the set of available L1 indices. Under transactional execution conflict misses may manifest as aborts, representing wasted or futile effort instead of a simple stall as would occur in normal execution mode.
niedziela, 19 kwietnia 2015
Conversion numbers to binary ASCII representation - new method
Recently I've checked different methods to convert numbers to binary representation, including use of new PDEP instruction from BMI2 extension.
Today I've updated the article with new SWAR version 2, a tricky use of multiplication. The method is not faster, but I like the approach---in certain conditions multiplication can be seen as multi-shift/bit-or instruction. I've already use multiplication in this way to emulate instruction pmovmskb.
Today I've updated the article with new SWAR version 2, a tricky use of multiplication. The method is not faster, but I like the approach---in certain conditions multiplication can be seen as multi-shift/bit-or instruction. I've already use multiplication in this way to emulate instruction pmovmskb.
poniedziałek, 13 kwietnia 2015
Speeding up bit-parallel population count
Nearly 50% faster than naive version for large data sets. Discovered by accident. :)
czwartek, 9 kwietnia 2015
Github repositories
I've put source code for my two articles at github:
BTW the article about popcount has gained popularity, and I hope another crazy idea about hacking MPSADBW will spread all over the world.
- SSSE3: fast popcount --- repository
- SSE4 string search - modification of Karp-Rabin algorithm --- repository
BTW the article about popcount has gained popularity, and I hope another crazy idea about hacking MPSADBW will spread all over the world.
SIMD-ized searching in unique constant dictionary
The problem: there is a ordered dictionary containing only
unique keys. Dictionary is read only, and keys are 32-bit (SSE) or
64-bit (AVX2). Read more
niedziela, 22 marca 2015
SIMD: detecting a bit pattern
The problem: there are 64-bit values with some data bits and some metadata bits; metadata includes a k-bit field describing a "type" (k >= 0). Type field is located in a lower 32-bits.
Procedure processes two "types", one denoted with code 3 and another with 5. When all items are of type 3 then we can use a fast AVX2 path, if there are some types 5, we have to call an additional function (a virtual method, to be precise). Read more ...
Procedure processes two "types", one denoted with code 3 and another with 5. When all items are of type 3 then we can use a fast AVX2 path, if there are some types 5, we have to call an additional function (a virtual method, to be precise). Read more ...
Compiler warnings are your future errors
Months ago I was asked to upgrade GCC from version 4.7 to 4.9 and also cleanup configure scripts. Not very exciting, merely time consuming task. Oh, we were hit by a bug in libstdc++, but simple patch have fixed the problem. Few weeks later I was asked to change GCC switch from -std=c++11 to -std=c++14 -- the easiest task in the world. I had to modify single script, run configure, type make, then run tests... everything was OK. Quite boring so far. Read more ...
AVX512: ternary functions evaluation
Intel's version of SIMD offers following 2-argument (binary) boolean functions: and, or, xor, and not. There isn't a single argument not, this function can be expressed with xor reg, ones, however this require additional, pre-set register.
AVX512F will come with very interesting instruction called vpternlog. Read more ...
AVX512F will come with very interesting instruction called vpternlog. Read more ...
sobota, 21 marca 2015
Not everything in AVX2 is 256-bit
AVX2 has added support for 256-bit arguments for many operations on packed integers, although not all. Some instructions accept the 256-bit registers, but operates on 128-bit lanes rather whole register.
There are three major groups of instructions: packing (narrowing conversion), unpacking (interleave) and permutations; below is a full list of instructions (with intrinsics):
There are three major groups of instructions: packing (narrowing conversion), unpacking (interleave) and permutations; below is a full list of instructions (with intrinsics):
- valignr (_mm256_alignr_epi8)
- vpslldq (_mm256_bslli_epi128)
- vpsrldq (_mm256_bsrli_epi128)
- vmpsadbw (_mm256_mpsadbw_epu8)
- vpacksswb (_mm256_packs_epi16)
- vpackssdw (_mm256_packs_epi32)
- vpackuswb (_mm256_packus_epi16)
- vpackusdw (_mm256_packus_epi32)
- vperm2i128 (_mm256_permute2x128_si256)
- vpermq (_mm256_permute4x64_epi64)
- vpermpd (_mm256_permute4x64_pd)
- vpshufd (_mm256_shuffle_epi32)
- vpshufb (_mm256_shuffle_epi8)
- vpshufhw (_mm256_shufflehi_epi16)
- vpshuflw (_mm256_shufflelo_epi16)
- vpslldq (_mm256_slli_si256)
- vpsrldq (_mm256_srli_si256)
- vpunpckhwd (_mm256_unpackhi_epi16)
- vpunpckhdq (_mm256_unpackhi_epi32)
- vpunpckhqdq (_mm256_unpackhi_epi64)
- vpunpckhbw (_mm256_unpackhi_epi8)
- vpunpcklwd (_mm256_unpacklo_epi16)
- vpunpckldq (_mm256_unpacklo_epi32)
- vpunpcklqdq (_mm256_unpacklo_epi64)
- vpunpcklbw (_mm256_unpacklo_epi8)
SSE: Generating mask where n leading (trailing) bytes are set
Informal specification:
Read more ...
__m128i mask_lower(const unsigned n) { assert(n < 16); switch (n) { case 0: return {0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; case 1: return {0xff, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; case 2: return {0xff, 0xff, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; // ... case 14: return {0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00}; case 15: return {0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff}; } }
__m128i mask_higher(const unsigned n) { assert(n < 16); return ~mask_lower(15 - n); }
Read more ...
Subskrybuj:
Posty (Atom)