Notes on computers, programming and all the stuff

This blog is obsolete

2018-12-10T22:46:00.001+01:00

I'm available on Twitter. The up-to-date list of my articles or short notes is available on my homepage 0x80.pl.

I used to publish here announcements or short notes. For various reasons it didn't work well. This blog probably won't be updated any more.

GCC: and inlining failed in call to always_inline 'FOO': target specific option mismatch

2016-05-01T10:19:00.002+02:00

AVX512 comes with the number of variants, and a compiler must know which AVX512 version it compiles.

GCC error inlining failed in call to always_inline 'FOO': target specific option mismatch occurs when a program containing some SIMD-intrinsics, and compiler has wrong or missing target options. The target option are introduced by "-m".

Lets look at the error from real world:

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlbwintrin.h:790:1: error: inlining failed in call to always_inline ‘_mm_movm_epi8’: target specific option mismatch

Now, when open avx512vlbwintrin.h, we see at the beginning of file:

...
#pragma GCC push_options
#pragma GCC target("avx512vl,avx512bw")
#define __DISABLE_AVX512VLBW__
...

Thus, in order to properly compile the program, gcc have to be feed with the two options listed at the target line: -mavx512vl and -mavx512bw.

bash: $0 value

2016-02-13T10:43:00.000+01:00

When a script is run from a command line then the 0th parameter is the script's name. However, when a script is run via source command, then the 0th parameter is a shell name. Weird, but true.

$ cat test.sh 
echo "\$0 is $0"

$ bash test.sh
$0 is test.sh

$ source ./test.sh
$0 is bash

Base64 encoding with SIMD instructions

2016-01-17T10:42:00.001+01:00

Base64 decoding could also be vectorized, although the speedup is not very impressive, merely 35%. Read more ...

Base64 encoding with SIMD instructions

2016-01-12T16:15:00.000+01:00

An SSE code is more than 2 times faster on Core i7, and around 70% faster on Core i5. Read more...

Fast conversion of floating-point values to string

2015-12-29T21:00:00.001+01:00

The conversion to string could be 15 times faster than sprintf. Read more...

Base64 encoding — implementation study

2015-12-27T19:27:00.000+01:00

Although base64 encoding is a very basic algorithm, it could be sped up a little (25% sounds good?) Read more...

Benefits from the obsession

2015-12-26T22:24:00.000+01:00

Everything has started few years ago when I found John Regher's blog. If you don't know the blog I highly recommend it. Among other things (I like the photos!) the author studies bugs in compilers, undefined behaviours and similar things. The word "overflow" appears quite often in his posts due to the great number of errors related to improper use of the integer arithmetic. Well, I don't know when the obsession has exactly started, but recently I realized that I am alert of all integer operations in my programs. Read more...

Implicit conversion - the enemy

2015-11-28T14:02:00.001+01:00

I wrote:

    result += string_utils::pad_left(string, '0');

I forget that pad_left signature is string, int, char and the char parameter has a default value. My mistake, without doubts.

This is another example of dark sides of the implicit conversions. C++ converts between characters and integers seamlessly. These two beast are distinct in the nature. Of course characters are represented by the numbers, however it's an implementation detail.

One can say: you made a mistake and now blame the language. No, I blame language's design. I'm afraid that we end up with something like Integer and int to overcome such problems.

Lesson learned: never use default parameters in public API (surprise!)

Another C++ nasty feature

2015-11-22T15:34:00.000+01:00

I'm fond of C++ weirdness, really. This language is full of traps, and it shocks me once in a while.

Let's look at this piece of code, a part of a larger module:

void validate_date() {

    // ...

    boost::optional<unsigned> clock_hour;
    boost::optional<unsigned> am_pm_clock;
    
    // ... fill these fields

    if (some sanity check failed) {
        
        report_error("user has entered wrong time: %d %s",
            *clock_hour
            *am_pm_clock ? "AM" : "PM");
    }
}

We would expect that in case of an error following line will be reported: "user has entered wrong time: 123 PM". Obvious. But please look closer at the code, do you see any mistake? There is one... dirty... hard to notice. I'll give you a minute.

So, the mistake is lack of comma between expressions *clock_hour and *am_pm_clock. However, the code is valid! It compiles! And it took me a little longer than a minute to understand what happened. Explanation is:

*clock_hour evaluates to expression of type unsigned;
then compiler sees * - a multiplication operator;
so checks if multiplication of unsigned (on the left side) with boost::optional<unsigned> (on the right side) is possible;
it is, because boost::optional<T> has conversion operator to type T.

We can rewrite the whole expression, now it should be clear:

    ((*clock_hour) * unsigned(am_pm_clock)) ? "AM" : "PM"

In result method is called with a single parameter of type cont char*.

It's bizarre, it's terrible. A language should help a programmer. In my opinion implicit conversions is the worst feature of C++.

Short report from code::dive 2015

2015-11-15T16:48:00.000+01:00

Few days ago I attended code::dive 2015, an IT conference in Wrocław, Poland. It was a one-day conference with a great number of presentations. There were four stages and five sessions, in total 20 talks. Impressive number! But an attender had to choose his own path of just five lectures. I think the decision was pretty difficult. Sometimes less is better. Read more

C++ magick

2015-07-15T20:15:00.003+02:00

A programmer wrote:

class container;

class IndexOutOfBounds {
public:
    IndexOutOfBounds(const std::string& msg);
};

void container::remove(int index) {

    if (index < 0 || index >= size()) {
        throw new IndexOutOfBounds("Invalid index: " + index);
    }

    // the rest of method
}

Do you see the mistake? Programmer assumed that expression "Invalid index: " + index evaluates to std::string("Invalid index: <some number>").

In fact type of expression "Invalid index: " is char[15], so char[15] + integer results in --- more or less --- char*. For index in range [0, 15] exception will carry tail of the message; for example when index=10 then it will be "dex: ". But for indexes larger than 15 and less than 0 program likely crash.

This is why I hate C++, the language has many dark corners, stupid conventions, implicit conversion, not to mention UB ("just" 150 UB, if you're curious).

Implementation of BT-trees

2015-06-20T08:37:00.001+02:00

Great paper by Lars F. Bonnichsen, Christian W. Probst, Sven Karlsson:

This document presents the full implementation details of BT-trees, a highly efficient ordered map, and an evaluation which compares BT-trees with unordered maps. BT- trees are often much faster than other ordered maps, and have comparable performance to unordered map implementations. However, in benchmarks which favor unordered maps, BT-trees are not faster than the fastest unordered map implementations we know of.

Boolean function for the rescue

2015-06-20T08:06:00.000+02:00

The problem is defined as follows: a set of features is saved using bit-sets (usually large), and there is a list/map/whatever of sets containing features of different objects. We have to find which features are unique. Read more...

Big progress in verification

2015-06-10T08:53:00.001+02:00

Formal verification is not easy task, for example ComCert compiler is able to verify, that optimizations haven't modified semantic of a program. Paper Verified correctness and security of OpenSSL HMAC describes verification of the whole "stack":

We have proved, with machine-checked proofs in Coq, that an OpenSSL implementation of HMAC with SHA-256 correctly implements its FIPS functional specification and that its functional specification guarantees the expected cryptographic properties. This is the first machine-checked cryptographic proof that combines a source-program implementation proof, a compiler-correctness proof, and a cryptographic-security proof, with no gaps at the specification interfaces.

Fast exact summation using small and large superaccumulators

2015-05-22T12:58:00.001+02:00

Interesting article by Radford M. Neal:

I present two new methods for exactly summing a set of floating-point numbers, and then correctly rounding to the nearest floating-point number. Higher accuracy than simple summation (rounding after each addition) is important in many applications, such as finding the sample mean of data.

Optimizing Dijkstra for real-world performance

2015-05-20T10:10:00.003+02:00

Another interesting paper:

Our experimental results currently put our prototype implementation at about twice as fast as the Boost implementation of the algorithm on both real-world and generated large graphs. Furthermore, this preliminary implementation was written in only a few weeks, by a single programmer. The fact that such an early prototype compares favorably against Boost, a well-known open source library developed by expert programmers, gives us reason to believe our design for the queue is indeed better suited to the problem at hand, and the favorable time measurements are not a product of any specific implementation technique we employed.

The Influence of Malloc Placement on TSX Hardware Transactional Memory

2015-04-21T07:49:00.000+02:00

Interesting paper:

We show that the placement policies of dynamic storage allocators -- such as those found in common "malloc" implementations -- can influence the L1 conflict miss rate in the L1. Conflict misses -- sometimes called mapping misses -- arise because of less than ideal associativity and represent imbalanced distribution of active memory blocks over the set of available L1 indices. Under transactional execution conflict misses may manifest as aborts, representing wasted or futile effort instead of a simple stall as would occur in normal execution mode.

Conversion numbers to binary ASCII representation - new method

2015-04-19T17:44:00.002+02:00

Recently I've checked different methods to convert numbers to binary representation, including use of new PDEP instruction from BMI2 extension.

Today I've updated the article with new SWAR version 2, a tricky use of multiplication. The method is not faster, but I like the approach---in certain conditions multiplication can be seen as multi-shift/bit-or instruction. I've already use multiplication in this way to emulate instruction pmovmskb.

Speeding up bit-parallel population count

2015-04-13T20:59:00.000+02:00

Nearly 50% faster than naive version for large data sets. Discovered by accident. :)

Github repositories

2015-04-09T21:09:00.000+02:00

I've put source code for my two articles at github:

Repositories contain original code, read: C99, 32-bit for GCC with inline assembly and also new programs in C++11 using intrinsics, tested in 64-bit environment.

BTW the article about popcount has gained popularity, and I hope another crazy idea about hacking MPSADBW will spread all over the world.

SIMD-ized searching in unique constant dictionary

2015-04-09T20:48:00.004+02:00

The problem: there is a ordered dictionary containing only unique keys. Dictionary is read only, and keys are 32-bit (SSE) or 64-bit (AVX2). Read more

SIMD: detecting a bit pattern

2015-03-22T19:32:00.000+01:00

The problem: there are 64-bit values with some data bits and some metadata bits; metadata includes a k-bit field describing a "type" (k >= 0). Type field is located in a lower 32-bits.

Procedure processes two "types", one denoted with code 3 and another with 5. When all items are of type 3 then we can use a fast AVX2 path, if there are some types 5, we have to call an additional function (a virtual method, to be precise). Read more ...

Compiler warnings are your future errors

2015-03-22T11:04:00.002+01:00

Months ago I was asked to upgrade GCC from version 4.7 to 4.9 and also cleanup configure scripts. Not very exciting, merely time consuming task. Oh, we were hit by a bug in libstdc++, but simple patch have fixed the problem. Few weeks later I was asked to change GCC switch from -std=c++11 to -std=c++14 -- the easiest task in the world. I had to modify single script, run configure, type make, then run tests... everything was OK. Quite boring so far. Read more ...

AVX512: ternary functions evaluation

2015-03-22T10:02:00.000+01:00

Intel's version of SIMD offers following 2-argument (binary) boolean functions: and, or, xor, and not. There isn't a single argument not, this function can be expressed with xor reg, ones, however this require additional, pre-set register.

AVX512F will come with very interesting instruction called vpternlog. Read more ...