Tutorial :VS 2008 C++ build output?


Why when I watch the build output from a VC++ project in VS do I see:

1>Generating code...

The output looks as though several compilation units are being handled before any code is generated. Is this really going on? I'm trying to improve build times, and by using pre-compiled headers, I've gotten great speedups for each ".cpp" file, but there is a relatively long pause during the "Generating Code..." message. I do not have "Whole Program Optimization" nor "Link Time Code Generation" turned on. If this is the case, then why? Why doesn't VC++ compile each ".cpp" individually (which would include the code generation phase)? If this isn't just an illusion of the output, is there cross-compilation-unit optimization potentially going on here? There don't appear to be any compiler options to control that behavior (I know about WPO and LTCG, as mentioned above).

The build log just shows the ".obj" files in the output directory, one per line. There is no indication of "Compiling..." vs. "Generating code..." steps.

I have confirmed that this behavior has nothing to do with the "maximum number of parallel project builds" setting in Tools -> Options -> Projects and Solutions -> Build and Run. Nor is it related to the MSBuild project build output verbosity setting. Indeed if I cancel the build before the "Generating code..." step, none of the ".obj" files will exist for the most recent set of "compiled" files. This implies that the compiler truly is handling multiple translation units together. Why is this?


Compiler architecture

The compiler is not generating code from the source directly, it first compiles it into an intermediate form (see compiler front-end) and then generates the code from the intermediate form, including any optimizations (see compiler back-end).

Visual Studio compiler process spawning

In a Visual Studio build compiler process (cl.exe) is executed to compile multiple source files sharing the same command line options in one command. The compiler first performs "compilation" sequentially for each file (this is most likely front-end), but "Generating code" (probably back-end) is done together for all files once compilation is done with them.

You can confirm this by watching cl.exe with Process Explorer.

Why code generation for multiple files at once

My guess is Code generation being done for multiple files at once is done to make the build process faster, as it includes some things which can be done only once for multiple sources, like instantiating templates - it has no use to instantiate them multiple times, as all instances but one would be discarded anyway.

Whole program optimization

In theory it would be possible to perform some cross-compilation-unit optimization as well at this point, but it is not done - no such optimizations are ever done unless enabled with /LTCG, and with LTCG the whole Code generation is done for the whole program at once (hence the Whole Program Optimization name).

Note: it seems as if WPO is done by linker, as it produces exe from obj files, but this a kind of illusion - the obj files are not real object files, they contain the intermediate representation, and the "linker" is not a real linker, as it is not only linking the existing code, it is generating and optimizing the code as well.


It is neither parallelization nor code optimization.

The long "Generating Code..." phase for multiple source files goes back to VC6. It occurs independent of optimizations settings or available CPUs, even in debug builds with optimizations disabled.

I haven't analyzed in detail, but my observations are: They occur when switching between units with different compile options, or when certain amounts of code has passed the "file-by-file" part. It's also the stage where most compiler crashes occured in VC6 .

Speculation: I've always assumed that it's the "hard part" that is improved by processing multiple items at once, maybe just the code and data loaded in cache. Another possibility is that the single step phase eats memory like crazy and "Generating code" releases that.

To improve build performance:

Buy the best machine you can afford
It is the fastest, cheapest improvement you can make. (unless you already have one). Move to Windows 7 x64, buy loads of RAM, and an i7 860 or similar. (Moving from a Core2 dual core gave me a factor of 6..8, building on all CPUs.)

(Don't go cheap on the disks, too.)

Split into separate projects for parallel builds
This is where 8 CPUS (even if 4 physical + HT) with loads of RAM come to play. You can enable per-project parallelization with /MP option, but this is incompatible with many other features.


At one time compilation meant parse the source and generate code. Now though, compilation means parse the source and build up a symbolic database representing the code. The database can then be transformed to resolved references between symbols. Later on, the database is used as the source to generate code.

You haven't got optimizations switched on. That will stop the build process from optimizing the generated code (or at least hint that optimizations shouldn't be done... I wouldn't like to guarantee no optimizations are performed). However, the build process is still optimized. So, multiple .cpp files are being batched together to do this.

I'm not sure how the decision is made as to how many .cpp files get batched together. Maybe the compiler starts processing files until it decides the memory size of the database is large enough such that if it grows any more the system will have to start doing excessive paging of data in and out to disk and the performance gains of batching any more .cpp files would be negated.

Anyway, I don't work for the VC compiler team, so can't answer conclusively, but I always assumed it was doing it for this reason.


There's a new write-up on the Visual C++ Blog that details some undocumented switches that can be used to time/profile various stages of the build process (I'm not sure how much, if any, of the write-up applies to versions of MSVC prior to VS2010). Interesting stuff which should provide at least a little insight into what's going on behind the scenes:

If nothing else, it lets you know what processes, dlls, and at least some of the phases of translation/processing correspond to which messages you see in normal build output.


It parallelizes the build (or at least the compile) if you have a multicore CPU

edit: I am pretty sure it parallelizes in the same was as "make -j", it compiles multiple cpp files at the same time (since cpp are generally independent) - but obviously links them once.
On my core-2 machine it is showing 2 devenv jobs while compiling a single project.

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »