May 23, 2010. C++ compiler shootout

I’ve been curious for some time just how differently various C++ compilers might perform on a real world code base, how much improvement over time can one expect, etc. Today, I finally did a benchmark on that.

The compilers used were GNU gcc 3.4.6 (pretty much the oldest you can expect these days, but still found in the wild, e.g. on that Centos 4.7 box I used for benchmarks); GNU gcc 4.5.0 (bleeding edge, built from source); and Intel icc 11.1. Hardware was 2x dual-core Xeon 3.6 Ghz, making a total of 4x cores.

Benchmarking just the single threaded performance wouldn’t had given me enough data for a nice graph, so I’ve also benched different thread counts. Load-testing script (in Python) was running on the same machine. The script fires N threads and keeps them running for 20 seconds, then counts the total query throughput. Index data was fully cached in RAM as we’re curious about CPU performance after all. As usual, I did 3 runs and picked the best one. The variance between runs was pretty low, normally within 1%. The results were as follows.

And here goes the raw qps (queries-per-second) data:

threads gcc-3.4.6 gcc-4.5.0 icc-11.1
1 131.3 140.9 137.5
2 261.8 283.3 276.9
3 277.2 313.5 303.6
4 314.3 345.3 337.9
5 311.7 340.6 329.3
6 312.3 340.7 334.2
7 317.9 344.1 334.3
8 300.2 335.0 321.2

The performance maxes out at 4 threads as expected, given the machine has 4 cores. However scaling from 1 to 2 threads results in almost precisely 2.0x performance boost while scaling from 2 to 4 results in almost precisely 1.2x boost only. That, I suspect, is likely caused by not-really-independent cores on that particular machine (think HyperThreading) and should vary depending on the hardware.

The performance differences from compiler to compiler are noticeable but not really huge. The baseline is, unsurprisingly, ye good olde gcc 3.4.6. Bleeding edge gcc 4.5.0 generally beats it by 7% to 9%, spiking at 13% just once (at 3 threads). Interestingly enough gcc 4.5.0 also constantly beats icc 11.1 by 2% to 4% on its native Intel gear. Now, icc-generated code performance might depend on the settings a whole lot, so I’ve also played with additional optimization settings a little bit (platform-specific SSE code, vectorization, inlining etc), but to no visible effect.

So generally, gcc 3.4.6 performs pretty well despite the age, icc 11.1 improves 5% to 7% on top of that, and gcc 4.5.0 is doing a really great job, beating a compiler from a silicon vendor on his own silicon by 2% to 4% extra.

Does it matter to squeeze those extra 7-9% by upgrading a compiler? IMO it’s a nice addition but definitely isn’t worth the hassle of upgrading the system-wide one, especially if you’re running just a handful of boxes. However, it might make sense to build (and then thoroughly test!) particular programs such as Sphinx using a new compiler. At 10 boxes, 10% savings means a free box, and that’s usually worth an extra testing suite run or three.

« »

4 Responses to “C++ compiler shootout”

  1. infracaninophile says:

    What about clang/LLVM ?

  2. shodan says:

    Infra, can it handle C++? I thought it only works with pure C.

  3. Ultimatevdn says:

    Thread behaviour looks a bit weird, which Kernel version did you use?

    There are significant changes in the 2.6.32 Kernel regarding Thread allocation.

  4. shodan says:

    @ultimatevdn, kernel was 2.6.9 (the box’s running Centos 4), but that must not matter much as the connections were persistent.

Leave a Reply