Autodiscovering Build System

Why cook?

Traditionally, software projects are built using a system based on the venerable make utility. This usually involved populating the source tree with Makefiles that each contain information about the ingredients for a build and the dependencies between them.

Makefiles are usually maintained by developers, and this often involved collecting information from various sources and inserting it in the proper format. As this is prone to error, some automation was introduced, but usually in an ad-hoc manner and without understanding that without a completely automated system, a build will always carry risks:

Several attempts were made to address some of these issues:

Of all the attempts to make builds more reliable, ClearCase's clearmake is probably the most successful. There are many shops that use it with good results, and the configuration records can provide a valuable tool not only for recording what went into a build, but also for tracking down problems.

Nevertheless, clearmake comes at a price:

Ordinary make and GNU make are handicapped by a long evolutionary history, made evident by a rather crufty syntax and a lack of fundamental capabilities, which would have been easy to implement in a clean slate design. Peter Miller's cook is such a design, and it has features not found in any make:

Why Autodiscovery?

The main goal is factorization. Data should not be needlessly replicated all over the place.

Much of the build data can be stored in the structure of the source tree. A directory that contains .c files is very likely to contain code for some library or executable. So, let's just add some conventions of positioning and naming of those directories and use that to determine our build.

References to other build objects can also be determined from the source code. For example, the #include's of header files can be used as a clue that the library associated with the header files is to be used.

Linkage to third party objects should be defined centrally, vastly simplifying the task of upgrading to new versions and avoiding having to re-explain over and over again how to link to certain complicated objects, e.g. SQL code generators et. al...

What you see is what you get. By using the source tree structure we avoid many of the pitfalls of classic Makefiles. We can impose simple build principles:

Definitions

derived object

A derived object (DO) is any file that is generated as the result of the build process. DO's are usually placed into DO subtrees and not mingled with the source files. This not only keeps the source tree clean but also allows for multiple parallel builds to avoid stepping on each other.

platform

A platform is usually a combination of a particular operating system version with some kind of hardware. The name of the current platform is determined by running the ct-uname wrapper script, and all the logic for determining the platform name is encoded in that script.

A platform usually implies a specific set of compilers and third party tools. The current version of the build system does not support cross-compilation.

variant

A variant is whatever you want it to be, but is distinguished from a platform in the sense that all variants on the same platform will usually share most tools and third party objects.

A variant name is usually composed of a set of attributes joined by the "_" (underscore) character. Various build rules and definitions may check for the existence of a certain attribute in the variant name and modify compiler flags and other things accordingly.

top-level target

A top level target is usually one file composed out of an aggregation of other source files or DOs. This file is most often the final build product, or at least a component of the deliverable runtime environment. An executable, a library, a jar file are examples of top-level targets.

Source Tree Layout

The source tree can take any form the development team finds convenient, as long as some simple rules are followed:

It may be tempting to make the structure of the source tree reflect the class inheritance hierarchy, but experience has shown that this is very hard to carry on consistantly over time. Also, source code ends up being hard to find, especially for new people unfamiliar with the code.

In practice, it is often better to use a simple, flat layout like this:

projectdir/
 |_DO/variant/platform
 |_runtime/variant/platform
 |  |_bin/                    Symlinks to executables
 |  |_lib/                    Symlinks to shared objects
 |_Howto.cook@                -> ../build/cook/include/main
 |_src/
    |_DO/variant/platform
    |_lib/
    |  |_DO/variant/platform
    |  |_a-m/
    |  |  |_DO/variant/platform
    |  |  |_liba/
    |  |  |  |_DO/variant/platform
    |  |  |  |_Howto.cook      "make archive;"
    |  |  |  |_class1.h
    |  |  |  |_class1.c
    |  |  |  |_class2.h
    |  |  |  |_class2.c
    |  |  |  |   ...
    |  |  |  |_Test/
    |  |  |     |_DO/variant/platform
    |  |  |     |_Howto.cook   "make test;"
    |  |  |     |_test1.h
    |  |  |     |_test1.c
    |  |  |     |_test2.h
    |  |  |     |_test2.c
    |  |  |    ...
    |  |  | 
    |  |  |_libb/
    |  |  |  |_DO/variant/platform
    |  |     |_Howto.cook      "make archive;"
    |  |    ...
    |  |_n-z/
    |     |_DO/variant/platform
    |     |_libn/
    |     |  |_DO/variant/platform
    |     |  |_Howto.cook      "make archive;"
    |     | ...
    |     |_libo/
    |        |_DO/variant/platform
    |        |_Howto.cook      "make archive;"
    |       ...
    |_bin 
       |_DO/variant/platform
       |_utils
       |  |_DO/variant/platform
       |  |_Howto.cook         "make simplex;"
       |  |_util1.c
       |  |_util2.c
       | ...
       |_bigprog
          |_DO/variant/platform
          |_Howto.cook         "make complex;"
          |_class1.h
          |_class1.c
          |_class2.h
          |_class2.c
          |   ...
          |_Test/
             |_DO/variant/platform
             |_Howto.cook      "make test;"
             |_test1.h
             |_test1.c
             |_test2.h
             |_test2.c
            ...

Note that even though top level targets may not refer to files outside of their containing directories, they may be nested. Any source files belonging to the nested top level targets no longer belong to the containing top level target. In the source tree example above, the top level target src/bin/bigprog does not own any files in src/bin/bigprog/Test.

Also note that every directory relevant to the build system has a DO subtree. Directories without a DO subtree appropriate for the selected variant and platform are ignored.

Note the runtime subtree at the top. This runtime tree is intended to be the root of a symlink farm which maps the shape of the production runtime tree to the existing development tree.

Finally, note that the example above is an example. The build system has no requirement that the source tree be organized in this way. You can choose to have deeply nested targets. The only restrictions that exist are:

What follows is a description of all top level target types:

Simple Executable

A simple executable is exactly one file which gets compiled and linked into one executable. A directory with a Howto.cook file containing the line "make simplex;" will cause every file in the directory and subdirectories to be compiled into one executable each.

Complex Executable

A complex executable is a collection of source files which get compiled and linked together into one executable. A directory with a Howto.cook file containing the line "make complex;" will cause all files except the one with the same basename of the directory to be aggregated into one archive (static library). The remaining file is assumed to contain the main() function and is then linked against that library and any other libraries it may depend on.

The static library produced in this process is available for external linking, allowing for example the creation of test suites which exercise code in the complex executable. To gain the full benefit of this method, it is recommended to keep the size of the file containing the main() function as small as possible, since the code therein is not available for testing.

Archive (Static Library)

This is just your standard static library. A directory with a Howto.cook file containing the line "make archive;" will cause all files to be compiled and aggregated into a static library, available for external linking.

Shared Object

This is just like a library, except for being loaded at runtime instead of statically copied at link time. A directory with a Howto.cook file containing the line "make dump_so;" will cause all files to be compiled and aggregated into a shared object, available for external linking.

Loadable Module

A loadable module is a shared object which has all of its dependencies registered, so that they can be resolved at runtime. Effectively, the loadable module is linked like an executable, but behaves like a shared object. The loading program is no longer responsible for knowing about the dependencies of the object being loaded. The Howto.cook file in the directory of a loadable module must contain the line "make smart_so;".

Test Suite

A test suite is a collection of source files defining methods all called using a well-defined convention. A directory with a Howto.cook file containing the line "make test;" will cause a source file containing a main() function calling all of the test methods to be generated, compiled and and linked together with all of the test methods and their dependencies.

Third Party Build

A directory containing a Howto.cook file with the line "make sub_build;" will cause a hand-off to a third party build system. The same directory must contain a build.sh shell script which executes the handoff.

When the build system processes a sub_build top level target, all of the sources which belong to the sub_build target are incrementally copied into the DO subtree, then the current working directory is set to the DO subtree and the build.sh script is executed.

All the stdout output produced by the sub_build is stored in a log file. This log file is also the actual product of the sub_build, and any dependencies between the third party build and any other component of the source tree must be expressed as a dependency to or from that logfile.

Running Cook

Cook is always invoked at the top level of the source tree. This is the highest location containing a Howto.cook file, which is termed as the top level Howto.cook file. Note that all the paths used within the Howto.cook files are relative paths, rooted at this top level.

In order to avoid having to constantly change directories, a wrapper script called "b" is provided which will do the switching for you, while redirecting the build log to the current directory and passing in the appropriate command line options to limit the build to the targets in your current directory.

In addition to the standard cook options which can be listed by saying cook --help, the following options are peculiar to this build system:

top=location
Limit the build to targets contained in the specified location. The "b" wrapper uses this to pass in the current location.
variant=variant
Build specified variant. The default variant is debug.
fast=y
Assume all the dependency files are up to date and skip the verification.
skip=y
Skip the compilation of any files outside of the directories specified in top=location, usually the current directory and its subdirectories. This may cause inconsistencies, but this option is provided for developers who claim to know when a rebuild of files outside the current directory is not required.

In addition, any object file in the current directory may be created directly by specifying it as a build target.

Phases

Autodiscovery is actually quite complex, since it needs to bootstrap itself from a simple, top level invocation. In addition, the autodiscovery process needs to be as efficient as possible. This efficiency is achieved mainly by splitting the information into many many small files and organizing the generation of these files using the same techniques as are used doing the build proper. This means that files are only rebuilt when some of their sources changed.

What follows is a detailed description of the autodiscovery phases.

Initialization

The top level Howto.cook file is essentially a skeleton. The meat is provided by various include files which live under build/cook/include. The top level Howto.cook file will read in all of the include files in the following directories, ignoring files which look like editor backup files:

build/cook/include/functions
Files in this directory define various cook functions used in various places. The standard is one file per function, the file name being the same as the function name.
build/cook/include/defs
Files in this directory define which operations actually exist and what their command line syntax and options are. They also define the default values for these options.
build/cook/include/collectors
Files in this directory contain code which maps the set of source files to the set of object files and dependency files.
build/cook/include/make
Files in this directory define the supported top level targets. For every top level target, functions which group the set of object files and assign them to a target are defined, as are various parameters governing the creation of link dependencies.
build/cook/include/use
Files in this directory define various canned settings for using third party tools or for building special kinds of objects. The values therein will override the defaults defined in build/cook/include/defs.
build/cook/include/phase_n
Files in these directories contain cook recipes which define how files are compiled and linked. These files are included incrementally as the build progresses through its various phases.

All build rules have dependencies to the files containing definitions used by them. Changing a definition file will result in all DO's produced by rules using these definitions to be rebuilt.

Source File Collection (Phase 1)

This is the only recursive part of the build. The end result of this phase is that the build system has a list of source files. This list will only include files that are:

The DO tree is itself under version control and looks like this: DO/variant/platform. This tree not only serves as a repository for DO's, but mainly serves to make selective parts of the source tree visible or invisible to specific variant/platform combinations. Use this instead of #ifdef's.

Local Customizations (Phase 2)

Any non-DO directory in the source tree may contain a Howto.cook file. This file contains definitions that are only valid in that directory and all of its subdirectories. The customization phase translates those Howto.cook files in order to localize all variable assignments and to parse references to modules. All DO's built in that directory and its subdirectories have a dependency to that file, so changing that file will result in those DO's being rebuilt.

A local Howto.cook file may contain any valid cook rule or statement, although care must be taken to remember that all paths are rooted from the position of the top level Howto.cook file. The following constructs have special meaning and are parsed and expanded by the customization process:

variable += value;
This creates a localized variable, with value appended to the current value of this variable from a higher level Howto.cook file, or to the default value as defined in build/cook/include/defs. Note that the whole assignment must be in one line. The parser is somewhat stupid about this.
variable := value;
This is like the += construct, except that the value replaces the previous value instead of being appended.
make top level target;
This causes all source files in this directory and its subdirectories to be associated with the specified top level target. If a subdirectory contains a Howto.cook file with its own make statement, then the files associated with that subtree are removed from the top level target of the containing directory. This allows one to have nested top level targets. The definitions associated with top level targets are stored in build/cook/include/make.
use module;
Use the canned settings defined in the specified module. Modules are stored in build/cook/include/use and allow a central definitions of all the flags and settings required to link to third party objects or rarely used system libraries.

Include Dependencies (Phase 3)

This is the grand-child of makedepend. It is extremly fast, but relies on some heuristics that need to be understood in order to maintain integrety of the incremental builds.

Every C/C++ source file is scanned for #include directives, and a so-called "cascade dependency" is generated for every file referenced in a #include, provided that it exists in the source tree.

A cascade dependency is a cook speciality, essentially a rule that says: "If you need file X, you will also need file Y." In other words, as soon as a dependency "A: X" is detected, an "A: Y" dependency is automatically added. Cascade dependencies allow for a flat scanning of header files, without requiring to actually include the referenced files, which would be very wasteful, as many header files would be parsed over and over again.

There are two problems with this flat scanning method:

The first problem is solved essentially by ignoring it. Should a #include reference a file that exists, then the additional dependency can't really hurt. It may cause a DO to be rebuilt unnecessarily, but that's it. If the referenced header file doesn't exist, then no dependency is generated, and either the reference is actually #ifdef'ed out, or there will be a compiler error, which is just as well.

The case of generated header files is more difficult and will be addressed in detail in the code generator section. Essentially, the problem is solved by creating a so-called docking header file that always exists and that will include the generated header file. This way, a dependency will exist and everything will be fine again.

Library Dependencies (Phase 4)

Once all the header file dependencies are known, these dependencies are matched with the known locations of the top level targets, and appropriate cascade dependencies (if you use libA, you need libB) and real dependencies (if LibA changes, relink) are created.

Program Dependencies (Phase 5)

Program dependencies on libraries are generated by creating the transitive closure of all library dependencies of the libraries used by that program. A simple topological sort of all dependencies will yield a linear search order for libraries that can be used to link and produce the executable.

Since a topological sort is used, it is essential that there be no cyclic dependencies between libraries. There shouldn't be any. If a cycle is encountered, it should be broken up, either by isolating the cycle and putting it into a separate library, or by merging the mutually dependent libraries into one big library.

Build (Phase 6)

At this stage, all the dependencies are known and the build proper can proceed.

Inspecting and Debugging the Build

When creating a DO, cook will also create two files which help inspecting and debugging the build. One file contains the list of all files required as ingredients for building the file in question. This file has the same name as the DO, with a ".need" extension tacked on to it. The other file lists those files which caused the DO to be rebuilt. This one has the ".why" extension appended to the DO name.

When operating under ClearCase, you can recursively track down the .need or the .why files using the ct-catcr script. This script will dump out the version extended names of all the source files which cobstributed towards the DO. By keeping a mapping between this output and the DO, one can reliably track whether a DO has changed between builds and produce patches containing only changed DOs.

Customizing the Build System

Every shop is different, so customizations are inevitable. The system is designed to be easily adaptable.

File Map of the Build System

The layout of the build system, as it is made available in the download section, is as follows:

build/
 |_cook/
    |_bin/platform/          Pre-built cook binaries (Currently only Linux)
    |_usage
    |_version
    |_helper/                Helper scripts, mostly in perl;
    |_include/
       |_main                Intended target for the the top level Howto.cook
       |                     symlinks in the actual development trees;
       |_suffixes/platform   File suffixes valid on platform;
       |_paths/platform      Paths to system commands and executables valid
       |                     on platform;
       |_defs/               Definitions of customizable, language specific
       |                     variables (e.g. compiler flags);
       |_use/                Module definitions;
       |_make/               Top-Level target definitions;
       |_rules/              Actual cook rules.

Modules

Autodiscovery can only go so far. There are too many third party objects that all have their own idiosyncracies and conventions, and autodiscovering which third party tool is used is not practical or even feasable in general.

The module construct exists to tell the build system how to use a third party tool. If a module exists, a developer can easily create a reference to a third party tool or library by inserting the following line in the Howto.cook file that lives in the directory containing the code requiring the module:

 use module;

Modules consist of defining variables prefixed by the module name. The following variables may be used, and any value specified here will be appended to the defaults.

module_cc_D_flags
-D flags passed to the C compiler. Must be specified as "'-DNAME=value'".
module_cc_I_flags
-I flags passed to the C compiler. Must be specified as -Ipath/to/someplace.
module_cc_flags
All other flags passed to the C compiler
module_cpp_D_flags
-D flags passed to the C++ compiler. Must be specified as "'-DNAME=value'".
module_cpp_I_flags
-I flags passed to the C++ compiler. Must be specified as -Ipath/to/someplace.
module_cpp_flags
All other flags passed to the C++ compiler
module_ld_l_flags
-l flags passed to the linker. Must be specifie as -llib.
module_ld_L_flags
-L flags passed to the linker. Must be specifie as -Lpath/to/someplace.
module_ld_flags
All other linker flags.
module_esql_I_flags
-I flags passed to the embedded SQL preprocessor. Must be specified as -Ipath/to/someplace.
module_external_obj
List of full paths to single external object files that must be linked into the executable.
module_external_lib
List of full paths to single libraries that must be linked into the executable.

The following variables override the defaults:

module_esql
Name of the embedded SQL preprocessor.
module_pre_esql
Commands or environment variable definitions that must be prepended to the invocation of the embedded SQL preprocessor.
module_ld
Name of the linker to be used.
module_ar
Name and flags of the archiver to be used. This should be used to control the creation of dynamic vs. static libraries
module_lib_suffix
Extension of the archive created by module_ar.

As new ways of building things are added, more variables may be defined. These definitions, together with their default values, reside in build/cook/include/defs

These definitions may reside in several places, and are read in this order:

Definitions for a specific module should reside in a file that has the same name as the module, but don't have to. If they do, a dependency to that file will be generated so that a change to that file will cause all targets using that module to be rebuilt.

How to Support New Platforms

To support a new platform, first edit the get_platform script in the build/cook/helper directory, ensuring that it returns a valid and unique platform string. This string is used as the platform directory name for platform specific directories (e.g. DO subtrees) in various locations, which will need to be created as needed.

Next, copy and edit the files in build/cook/include/suffixes and build/cook/include/paths, ensuring that all the sufix and path definitions are valid for the new platform.

Finally, any platform specific language definitions located in the build/cook/include/defs directory need to be adapted.

How to Use New Languages

How to Define New Top-Level Build Targets

Code Generators

Currently, there is support for lex, yacc and various embedded SQL preprocessors. Source files for these preprocessors reside in the same place as C/C++ files, and the preprocessors are invoked prior to the include dependency phase.

There is one aspect that requires a little bit of care: generated header files. Since include dependencies to non-existing files are ignored, and since generated header files live in the DO subtree, a so-called docking header file must be use to access the generated header file. A docking header file looks like this:

#ifndef _generated_header_h_
#define _generated_header_h_

/* The following macro "stringifies" a constant. This allows constants to be
 * passed in via compiler flags without using fancy quoting mechanisms.
 */

/* Go ask the ANSI committee why you need two levels of indirection here */
#ifndef cook_str
#define cook_str(s)  # s
#define cook_string(s) cook_str(s)
#endif

/* Now include the generated header file */
#include cook_string(OBJDIR/generated_header.h)

#endif

The compiler invocation includes a -DOBJDIR=DO/variant/platform flag. This works most of the time, but may require you to #undef any macro that may be part of the OBJDIR string, as it will be faithfully but incorrectly resolved. The common ones are AIX, sparc, i386 ...

The name of the docking header file must be identical to the name of the generated header file, as a dependency is created automatically.