Announcement

Monday, March 02, 2009

Portable Precompiled Headers for c/c++

If you are writing cross platform application which is fairly big, then one problem you may encounter is how to reduce compilation time. (Precompiled Headers)

One way to reduce the compilation time is to reduce the number of #include class for the same file. Usually we add #ifndef/#endif pair at start and end of header file to avoid problems of multiple includes. However, compiler still has to parse the entire file to find #ifdef and its matching #endif and then throw away the entire file at the 2nd and subsequent times.

On Windows, Visual Studio compiler has an option of 'using' precompiled headers. Precompiled headers are supposed to solve this problem. However, my experience is sometimes turning ON precompiled headers actually increases the compilation time. This is typically the case when if the files which are regularly modified are added in the precompiled headers. Also it is specific to compiler and not useful on other compilers/platforms.

The technique below gives you reduced compilation time by ensuring that each header file (.h) is included ONLY once. It also has some nice additional side benefits like correctly defining the include sequence, simpler cross platform includes (e.g. using stdlib.h on windows but unistd.h on unix), easier way to change include paths etc.

The technique relies on #ifdef and #endif pair. Since #ifdef/#endif are part of every C++ compiler implementation, it works with all C++ compilers.

Here are steps
  1. In each module in your project, add a module specific include file module_inc.h
  2. Suppose you want to add a new header (a.h) in the module. And a.h depends on b.h and c.h. make following changes in the module_inc.h

    // define dependancies of a.h
    #ifdef A_INC
    #define B_INC // if a.h depends on b.h
    #define C_INC // if a.h depends on c.h
    #endif //A_INC
    // define dependancies of b.h
    #ifdef B_INC
    #define C_INC
    #define D_INC
    #endif //B_INC

    #ifdef C_INC
    #include "c.h"
    #endif

    #ifdef B_INC
    #include "b.h"
    #endif

    #ifdef A_INC
    #include "a.h"
    #endif

  3. Now to add the a.h files dependancy on b.h and c.h, add the following lines in a.h

    #ifndef A_H
    #define A_H

    #define B_INC
    #define C_INC

    #endif
  4. To add a.h in a.cpp, add following lines in a.cpp

    #define A_INC
    #include "module_inc.h"
Thats it.

Advantages in defining include files and dependancies this way

You will find few unique advantages in defining include files and dependancies this way
  • First a.h is directly include only in "module_inc.h". Everywhere else module_inc.h is included and only A_INC is defined.
  • Even though a.h depends on b.h and c.h, both b.h and c.h are NOT directly included in or in a.cpp. Still b.h and c.h are get preprocessed in correct sequence and processed only once.
  • It is very difficult to create circular dependancies in include files in this framework. Compilation will break in case of circular dependacies, which is a good thing.
  • Since a.h is never directly included in a.cpp or any other .cpp file in the module, (a) you can directory or path of a.h, by changing the path at module_inc.h (b) you can split the file in two header files a1.h and a2.h and change the includes in module_inc.h. In both the cases, the changes are 'localized' to module_inc.h. After the modifications are done in module_inc.h, you will just need to recompile your project. Imagine how hard it will be without this kind of structure.
  • It doesnot matter the sequence in which 'A_INC', 'B_INC' or 'C_INC' is defined, the files are always included in correct sequence. So the correcting the sequence of #include statements in different .cpp files because of changes in class dependancies it not required.

How exactly does this work ?

Lets look at what happens when compiler processes a.cpp
  1. First A_INC is defined.
  2. Compiler (i.e. preprocessor) starts processing module_inc.h
  3. module_inc.h has a preprocessor directive "#ifdef A_INC". This directive defines the dependancies of a.h by defining two other symbols B_INC and C_INC
  4. The compiler/preprocessor continues the preprocessing of module_inc.h and encounters #ifdef B_INC. This section defines the dependancies of b.h.
  5. The preprocessing continues and compiler encouters #ifdef C_INC and this section has instruction to actually include c.h (i.e. #include "c.h"). Notice that by this time, C_INC is defined TWICE as a.h depends on c.h and also b.h depends on c.h However actual file is included/processed only ONCE (i.e. #include "c.h" happens only once)
  6. The processign continues and b.h and a.h are now included/preprocessed. Notice that sequence of #include is reverse of sequence of dependancy definition. This guarantees that files are included in correct sequence irrespective of sequence of #define calls in .cpp files.
I have used this technique in two large scale cross platform C++ development projects. (150,000+ lines, 25+ man years of development). This technique sped up the full rebuilds, avoided dependancy mistakes, caught potential circular dependancies early in the life cycle of project.

I always try to use techniques which help in either avoiding mistakes or detecting them early. This is one technique definitely helps

No comments: