Speed Test 2: Comparing C++ Compilers on Windows

Hands, typing. Always with the typing.

In our previous article, we compared a few C++ compilers on Linux. This time we’re going to perform a similar set of tests for Windows.

One notable aspect of using C++ on Windows is you’re typically encouraged to use an IDE such as Microsoft Visual Studio, or else a competing-style IDE such as one from Embarcadero called RAD Studio (which came up from the ashes of the old Borland products, Delphi and C++ Builder). Although it can also be used in a standalone fashion, the Intel compiler integrates into Visual Studio (but not the free Express versions of Visual Studio); there are also some very nice, free IDEs such as CodeLite and Code::Blocks. But to perform tests of the compilers themselves, I removed the IDE variable in favor of focusing only on the command line.

Like all things Windows, it can get costly doing C++ development in this environment. However, there are a couple notable exceptions:

  • The free and open-source cygwin system includes a build of the g++ compiler, which I’m including in these tests.
  • The free and open-source mingW also includes a builder of the g++ compiler, which I’m including in these tests.
  • Microsoft makes its Express versions of Visual Studio available for free, including the C++ version. While these are greatly scaled-down, the underlying compiler is the same as the one used in the premium versions of Visual Studio, and I’m including the premium version in these tests.
  • Embacardero, which produces a product we’re comparing today called RAD Studio, also has a free compiler called Borland C++ 5.5. I’ll have more to say about this product shortly. Short version: Skip it. It’s worthless. But the compiler they ship with their RAD Studio proved to be quite impressive.

For the premium compilers, today I’m testing:

  • Intel C++ Compiler, as part of their Parallel Studio XE 2013
  • Microsoft C++ Compiler, as part of Visual Studio
  • Embarcadero C++ 6.70 Compiler that ships with RAD Studio and C++ Builder.

In all cases I’m using the 64-bit versions of the compilers. But let’s be clear about something: When I say 64-bit compiler, I’m talking about the generated code. The compiler itself may or may not be a 64-bit application.

A Short Aside About Installing Embarcadero’s RAD Studio

Many of the names in the C++ Builder portion of the RAD Studio installation options take me back to the golden age of computers, when the web was brand new and few people had yet heard the letters “www.” I’m talking about the mid 1990s, when I used things such as Interbase and the Borland Database Engine, as well as the VCL controls; these are all available today in the RAD Studio installer, apparently left around to gum up the machinery when Borland dumped its dead dinosaurs on Embarcadero’s front porch when the latter bought the aging technology known as C++ Builder.

The installer proudly proclaims that you can build powerful, amazing software with—and I quote—“blazing native performance.” But to use the installer, you have to let it first install Microsoft JSharp Runtime 2.0, which was released six years ago in 2007, and later killed off by Microsoft. Oh, and if you’re interested in using this proud RAD Studio tool for distributed programming with this newfangled thing called “The Internet,” it even supports CORBA, the Common Object Request Broker Architecture. I last fussed with CORBA in 1998. Just hearing the name is a flashback to the ‘90s when I sat in a lawn chair at Lollapooza, drank beer, and listened to brand-new bands named Smashing Pumpkins and Green Day.

And let’s not forget that Embarcadero also offers a free C++ compiler for Windows called BCB32. The download includes an installer that was copyrighted in 2000 by Inprise; the accompanying text brags that it’s a powerful “ANSI” compiler and that it’s “the high performance foundation and core technology of Embarcadero’s award-winning C++Builder product line,” and that it includes an “ANSI/ISO Standard Template Library.” I’m not going to review this standalone compiler here. Instead I can show it to my son and explain what life was like in the olden days.

One odd little side-note before we move on: I was checking out the C++11 features available in Embarcadero’s C++ compiler. In one of their blogs, the company says, “C++11 support by BCC64 is based on Clang 3.1; for more information, see http://clang.llvm.org/cxx_status.html.” And indeed, there was a clang executable in their bin directory. I tried to run it, but it would just freeze for a long time before finally doing anything. I’m not sure what was happening there; as such, I used the other one I downloaded separately.

LLVM

There’s a semi-official build of the LLVM compiler that integrates with Visual Studio. I installed it and nothing would compile within Visual Studio; I received errors when it tried to compile Visual Studio’s own header files. But from the command line, I was able to set up the path and other environment variables, and after that it worked just fine.

The Tests

The last time I published tests of C++ compilers, some people commented about why I would bother testing the times to compile. So let’s address that right now. First, I personally have no real use for the compile times, and clearly some readers didn’t, either.

But some organizations do, for various reasons. If they determine that two compilers meet all their needs and are essentially equal in every aspect, they will probably consider other factors to help them decide which one to settle on. Price is one possible factor. But if they’re compiling thousands of files daily, they may want to make sure the build completes in a reasonable timeframe. As we saw in my last article, the difference in compile times wasn’t huge, but the differences were still there. So I’m mostly offering the time-data here for completeness’ sake.

In order to test how long a process takes, we ideally want to turn off extraneous software and services that might interfere with the process. With Linux, you can turn off most services and still have a command-line. With Windows, well, that’s nearly impossible: shutting down these “essential” services will usually halt the system. Fortunately, we can get around the problem by using a system with multiple cores. Windows doesn’t let you devote a single core to a particular process—but if you have, for example, a quad core with hyperthreading, there’s a good chance your time-consuming process will get its own core. Also, by letting the usual system processes do their thing while we test, we end up with numbers that will be more realistic on an actual development machine.

In each case, I did a full compile to build an executable, and noted the file size. Then I cleaned and did a “compile only,” recording the time as I did so. I repeated the compile-only three times, and recorded all three times. In a couple instances, when something strange happened (such as the compile taking an extraordinarily lengthy amount of time—probably because some other process interfered), I considered it a statistical outlier and didn’t include it in the results, instead doing an additional timing to replace the outlier.

To measure the time, I tried to find an equivalent to the Linux time command. There isn’t really one in Windows. Cygwin includes one, but I didn’t want to try to get all these compilers configured under cygwin. Instead, I used Windows Powershell. It includes Measure-Command, which measures how long the command passed to it runs. But unlike the Linux time command, it doesn’t provide details on how much of that time is actual processor time, versus time waiting for the operating system (and so on). Instead, it seems to just measure how long it takes from the time the command launches until it finishes.

I used the same huge C++ file I did in the previous article. I won’t repeat the explanation; instead you can read about it there.

Finally, you’ll notice I didn’t include a test with the multicore library Threading Building Blocks. For the tests today, I’m only examining the compiler times and the size of the object files and executable files. Also, I plan to look at the generated assembly code for the optimizations, and compare how the compilers do there. I’ve done quite a bit of work with assembly code, and could therefore do an analysis of the generated vectorized SIMD code, and determine which compilers offer the best support for vectorized code, as well as including code in the final executable that detects the existing SIMD features of the host processor, and runs code accordingly.

Embarcadero Compilers

Although I poked fun at some of the tools that come with the Embarcadero RAD studio, I will say that ultimately I was impressed with this compiler in terms of speed and features. I didn’t do a full test of its C++11 features, but I did measure some features such as lambda functions (not for this article, but just as a side project out of curiosity). And to my surprise, the compiler handled the C++11 features just fine. The original Borland had a history of building compilers that were known for their speed and modern features for the time. While the Embarcadero RAD Studio might seem a bit silly from today’s perspective, the underlying compiler was actually quite impressive, both with its features and its speed. (Note, however, that the C++11 features were only present in the 64-bit compiler, not the 32-bit compiler.)

Here are the results:

Embacadero bcc64 (No optimization. Default is to not include debug information.)

Command line: Measure-Command { bcc64 -S test4.cpp }

Total Milliseconds for compile only (first try): 3134.0215

Total Milliseconds for compile only (second try): 2936.2765

Total Milliseconds for compile only (third try): 2941.5167

Object file size: 4,663,371 bytes

Final executable file size: 770,657

Embacadero bcc64 (full optimization with command-line switch -O3)

Command line: Measure-Command { bcc64 -S -O3 test4.cpp }

Total Milliseconds for compile only: 1812.0551

Total Milliseconds for compile only: 1823.2381

Total Milliseconds for compile only: 1870.9476

Object file size: 416 bytes

Final executable file size: 59,602 bytes

LLVM clang

The clang compiler as usual did quite well in the tests. It wasn’t as fast as the bcc64, but with its optimizations it produced small executable files.

clang (No optimization, with command-line switch -O0)

Command line: Measure-Command { clang -c -O0 test4.cpp }

Total Milliseconds for compile only (first try): 5957.0582

Total Milliseconds for compile only (second try): 5880.9037

Total Milliseconds for compile only (third try): 5851.7289

Object file size: 1,271,660 bytes

clang (Full optimization, with command-line switch -O3)

Command line: Measure-Command { clang -c -O3 test4.cpp }

Total Milliseconds for compile only: 4996.9006

Total Milliseconds for compile only: 4828.7302

Total Milliseconds for compile only: 4818.8122

Object file size: 143 bytes

Final executable file size: 3335 bytes

64-bit MingW

The MingW compiler’s appeal is that it links with the Microsoft runtime libraries built into Windows, meaning you don’t have to ship an extra DLLs with your final program (and it’s also free, which is always good). After building with MingW’s g++ compiler, you can test-check out the final executable using Visual Studio’s dumpbin program, passing in the /imports option. Doing so shows that the executable links at runtime with msvcrt.dll. (There’s also an interesting commentary on their site about the state of free compilers in the Windows world. I encourage you to check it out.) Also, the 64-bit version is maintained separately; I used a distribution found on SourceForge. Once installed, I pointed Powershell’s path to the necessary executables.

g++ under 64-bit MingW (No optimization, with command-line switch -O0)

Command line: Measure-Command { x86_64-w64-mingw32-g++.exe -O0 -c test4.cpp -o test4.o }

Total Milliseconds for compile only: 12232.63

Total Milliseconds for compile only: 12147.5664

Total Milliseconds for compile only: 12179.9838

Object file size: 3,443,685 bytes

Final executable file size: 2,112,966 bytes

g++ under 64-bit MingW (Full optimization, with command-line switch -O3)

Command line: Measure-Command { g++ -c -O3 test4.cpp }

Total Milliseconds for compile only: 5114.4662

Total Milliseconds for compile only: 5164.4459

Total Milliseconds for compile only: 5100.8012

Object file size: 864 bytes

Final executable file size: 258,462 bytes

64-bit Cygwin

The g++ available with Cygwin works similarly to that of MingW except, instead of linking with the runtime library that comes with Windows, it links with Cygwin’s own runtime library, cygwin1.dll (which, again, you can verify using the dumpbin utility).

In order to use this compiler, I used powershell and set the path to point to the distribution’s own g++ compiler.

64-bit g++ under Cygwin (No optimization)

Command line: Measure-Command { g++ -c -O0 test4.cpp }

Total Milliseconds for compile only: 14410.9207

Total Milliseconds for compile only: 14457.1481

Total Milliseconds for compile only: 14427.838

Object file size: 3,443,685 bytes

Final executable file size: 1,917,623 bytes

64-bit g++ under Cygwin (full optimization)

Command line: Measure-Command { g++ -c -O3 test4.cpp }

Total Milliseconds for compile only: 6485.9999

Total Milliseconds for compile only: 6486.0892

Total Milliseconds for compile only: 6439.0032

Object file size: 864 bytes

Final executable file size: 61,696 bytes

Microsoft C++ Compiler

This compiler can be run at the command-line, although Microsoft clearly expects that most people will be using it from within an IDE, particularly Visual Studio. The command-line program is called cl.exe. To tell it not to link, pass it the /c option. There are several optimization levels, including “minimize space,” “maximize speed” and “maximum optimizations.” With both the minimize space and maximum optimizations, the compiled object file size of the test file was very large—much larger than the other compilers. The final executable with maximum optimization was only a bit larger than the other final executables, but still larger nevertheless. (Remember, although I’m not using any #includes to keep things like iostream out of the picture, there’s still a basic runtime that gets linked in, which includes, for example, a start function that calls the main.)

Microsoft cl with Maximum optimization (Note: Testing with “minimum space” optimization only resulted in the same file sizes for both object file and executable file.)

Command line: Measure-Command { cl /Ox /c /nologo test4.cpp }

Total Milliseconds for compile only: 9378.7025

Total Milliseconds for compile only: 9444.0398

Total Milliseconds for compile only: 9536.6226

Object file size: 815,724 bytes

Final executable file size: 48,128 bytes

Microsoft cl with no optimization

Command line: Measure-Command { cl /c /nologo test4.cpp }

Total Milliseconds for compile only: 7889.2335

Total Milliseconds for compile only: 7762.6916

Total Milliseconds for compile only: 7749.9278

Object file size: 986,117 bytes

Final executable file size: 126,976 bytes

Intel Compiler

The Intel compiler occasionally “calls home” to an Intel-owned Website to check licensing information. When it does so, it prints out a message about when the current license expires. I didn’t use the results when that happens, since it would add time and skew the timing results. Also, the Intel compiler offers several options for optimizing speed, as well as one for limiting code size, as well as one for, as the help states, enabling “speed optimizations, but [disabling] some optimizations which increase code size for small speed benefit.” For my test program with a huge number of templates, none of these optimizations differed in the resulting executable file size, which makes sense since they’re mostly focused on speed. We’ll test them separately.

Intel icl compiler with no optimization

Command line: Measure-Command { .icl /Od /c test4.cpp }

TotalMilliseconds for compile only: 5974.4083

TotalMilliseconds for compile only: 6008.3199

TotalMilliseconds for compile only: 6058.1437

Object file size: 2,391,263 bytes

Final exectuable file size: 175,616 bytes

Intel icl compiler with full optimization

Command line: Measure-Command { .icl /Ox /c test4.cpp }

TotalMilliseconds : 13064.7322

TotalMilliseconds : 13085.5804

TotalMilliseconds : 13117.1896

Object file size: 685 bytes

Final executable file size: 63,488 bytes

A Question for Readers

When I did the tests for the first article, I noticed something odd, and some readers noticed it too. For some compilers, running with full optimization was faster than running them without. I saw that this time around, as well: both the bcc64 and g++ compilers exhibited this behavior. To be frank, I’m not sure why this would happen. Optimization requires some sophisticated algorithms that analyze the code, and so it seems to take longer to compile when optimization is turned on. The generated files with optimization turned off are larger, but not so much so that it would take that much longer to write them. I’m not sure about the root cause of this situation, and I would love to hear from readers who might have honest suggestions for why this might be happening, especially readers who might have worked for a compiler company. (I’ll talk to some people I know and invite them to comment.)

A Final Note

I want to be clear about one point: we can’t really compare these tests to the Linux tests in the earlier article, because we’re in a completely different environment. Not only am I on a different operating system, I’m actually on a slower machine: this is just a quad core AMD unit. (With the previous article, some readers mentioned that I didn’t include all the specs for the computer; I did this time, but I want to be clear that my goal here is to show how the compilers performed relative to each other. Obviously, if you ran these on a faster machine, you’ll get shorter times. And further, the specs of the computer certainly shouldn’t affect the final size of the executables.)

 

Image: wrangler/Shutterstock.com

Comments

  1. BY Guti says:

    Very impressive your results with BCC64.
    I am glad they switched to CLANG, as the only way to avoid their poor quality generated code in terms of performance. Unfortunatelly, the old compiler is still there on BCC32.

  2. BY Doug says:

    Interesting data. Charts would be nice

  3. BY Louise says:

    This test is USELESS ! (sorry)

    Don’t get me wrong – fine article but the test itself tels you NOTHING about the compiller and the QUALITY of code generated.
    To do real compiller testing you should grab some nice library (like for example vorbis or ffmpeg or something similar) and generate executable using each of compilers you mention (that alone would not be a trivial task as they differ a lot in details). Then you should launch for example ogg or movie compression using generated executable, and compare timings of that – this will tell you how god the compiler was at optimizing code. (to be fair all versions of the test library should be ‘generic’ (do not use hand written assembly — maybe two versions one with intrinsics enabled, second withi disabled, etc.). The test would tell us something about the performance of generated code with is a MAJOR factor while chosing compiller (or in reality choosing Intel over msvc as the other ones (llvm, gcc) generate code that is waaaaay slower ;))
    Compilation time and object size is realy a minor factor in c++ world.

Post a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>