Autodiscovering Build System
Why cook?
Traditionally, software projects are built using a system based on the venerable make utility. This usually involved populating the source tree with Makefiles that each contain information about the ingredients for a build and the dependencies between them.
Makefiles are usually maintained by developers, and this often involved collecting information from various sources and inserting it in the proper format. As this is prone to error, some automation was introduced, but usually in an ad-hoc manner and without understanding that without a completely automated system, a build will always carry risks:
- Makefiles may refer to files not under version control. The build may work fine for the developer, but the release engineer won't be able to reproduce it.
- The build rules in the Makefiles may have dependencies on the developer's environment. Most common are dependencies on various search paths and the developer's own toolset. Again, the release engineer will not have the benefit of those tools.
- Maintaining the dependency information in the Makefiles is very tedious and fragile. A common result is that release engineers cannot trust incremental builds and rely on "make clean; make" to do their builds. This habit of course completely negates the original purpose of make, which is build avoidance.
Several attempts were made to address some of these issues:
- makedepend is a tool that will generate header file dependencies that are readable by make. Traditionally, the output of that tool was appended to the Makefile or included by it. Of course, that file had to be regenerated whenever there was a chance that dependencies may have changed. Still, systematically regenerating the dependency files is a huge improvement over "make clean; make".
- Imake and xmkmf is a logical extension of the makedepend idea. It's main advantage is that it provides a structure to centralize the definitions of various build rules, thereby avoiding dependencies on the developer's environment. Unfortunately, it is only reliable of all makefiles are systematically regenerated at every build and therefore performance is worse than "make clean; make".
- ClearCase provides its own attempt at solving this problem: clearmake. This tool performs so-called audited builds, taking advantage of the fact that ClearCase can trivially monitor file access while the build is going on. It stores the result of this audit in so-called configuration records (CR's) attached to every derived object (DO) created by the build. It uses the CR's for build avoidance and for winkin. A winkin happens when someone else created a DO that has the same ingredients as the DO you're attempting to create. In that case, no build is performed but the other DO is simply reused. For expensive DO's like libraries and executables, this can result in significant time savings.
Of all the attempts to make builds more reliable, ClearCase's clearmake is probably the most successful. There are many shops that use it with good results, and the configuration records can provide a valuable tool not only for recording what went into a build, but also for tracking down problems.
Nevertheless, clearmake comes at a price:
- clearmake is obviously limited to those platforms where ClearCase will run. The Makefiles are sufficiently compatible at a basic level so that you can run ordinary make, but then you can't take advantage of the dependency maintenance features and must revert to "make clean; make".
- Even on platforms supported by ClearCase, clearmake loses a lot of its power when run in snapshot views.
- clearmake in dynamic views is a resource hog. Those winkins come at a price, and that price is the saturation of your LAN. Considering that hard drive and processor speeds increased immensely while LAN speeds hardly budged in the last 5 years, a winkin nowadays may very well be slower than a local rebuild.
- Finally, clearmake is a proprietary product. It is bad enough that ClearCase is proprietary and thereby forces one to hand over a strategic asset to an outside company that may or may not choose to support you, there is no need to coumpound the problem by limiting your development options to those systems that have ClearCase installed.
Ordinary make and GNU make are handicapped by a long evolutionary history, made evident by a rather crufty syntax and a lack of fundamental capabilities, which would have been easy to implement in a clean slate design. Peter Miller's cook is such a design, and it has features not found in any make:
- Multi-part pattern rules and regular expression pattern rules;
- So-called cascaded dependencies: the ability to say: "if something depends on X, then that something also depends on Y";
- The ability to finely control the walking of the dependency tree via recipe gates, unconditional execution of parts of a recipe body and deliberate triggering of a new walk via a function. The latter is an extremely useful concept, as it allows one to generate cook include files prior to including them.
Why Autodiscovery?
The main goal is factorization. Data should not be needlessly replicated all over the place.
Much of the build data can be stored in the structure of the source tree. A directory that contains .c files is very likely to contain code for some library or executable. So, let's just add some conventions of positioning and naming of those directories and use that to determine our build.
References to other build objects can also be determined from the source code. For example, the #include's of header files can be used as a clue that the library associated with the header files is to be used.
Linkage to third party objects should be defined centrally, vastly simplifying the task of upgrading to new versions and avoiding having to re-explain over and over again how to link to certain complicated objects, e.g. SQL code generators et. al...
What you see is what you get. By using the source tree structure we avoid many of the pitfalls of classic Makefiles. We can impose simple build principles:
- Builds are "local" in the sense that they will not "install" stuff elsewhere. Builds may pull in stuff from elsewhere, but only in a well defined manner (executables pull libraries, packages pull in contents ...).
- Builds include only and all files under version control. If you see a source file under version control, you may assume it's participating. No need to double check. Obviously, we take full advantage of ClearCase's ability to move, rename and delete files in complete safety.
- Incremental builds always work. Well almost... cook-fix-manifests may need to be run whenever files are moved around or new platforms are added. The impact is kept to a minimum, and the fingerprinting feature of cook limits the damage.
- Developers use the exact same build process as the release engineers. This should cut down on the "it worked for me" type errors.
- Developers are gently nudged into compliance with some basic code layout principles. Doing "weird" things means more work, but if the "weird" thing is really useful, it will be worth the extra effort, since the new "trick" will be coded into a central location, ready to be reused.
Definitions
- derived object
A derived object (DO) is any file that is generated as the result of the build process. DO's are usually placed into DO subtrees and not mingled with the source files. This not only keeps the source tree clean but also allows for multiple parallel builds to avoid stepping on each other.
- platform
A platform is usually a combination of a particular operating system version with some kind of hardware. The name of the current platform is determined by running the ct-uname wrapper script, and all the logic for determining the platform name is encoded in that script.
A platform usually implies a specific set of compilers and third party tools. The current version of the build system does not support cross-compilation.
- variant
A variant is whatever you want it to be, but is distinguished from a platform in the sense that all variants on the same platform will usually share most tools and third party objects.
A variant name is usually composed of a set of attributes joined by the "_" (underscore) character. Various build rules and definitions may check for the existence of a certain attribute in the variant name and modify compiler flags and other things accordingly.
- top-level target
A top level target is usually one file composed out of an aggregation of other source files or DOs. This file is most often the final build product, or at least a component of the deliverable runtime environment. An executable, a library, a jar file are examples of top-level targets.
Source Tree Layout
The source tree can take any form the development team finds convenient, as long as some simple rules are followed:
- Every top level target in the source tree is declared by creating a Howto.cook file in the top-most directory containing all the source files directly contributing to that top level target with a single line saying: "make target-name".
- A top level target (executable, library etc...) may only contain objects derived from source files located within the directory subtree containing the top level target definition.
- Any source file is going to contribute to exactly one top level target. source files are not to be linked, copied or otherwise included into multiple targets.
It may be tempting to make the structure of the source tree reflect the class inheritance hierarchy, but experience has shown that this is very hard to carry on consistantly over time. Also, source code ends up being hard to find, especially for new people unfamiliar with the code.
In practice, it is often better to use a simple, flat layout like this:
projectdir/ |_DO/variant/platform |_runtime/variant/platform | |_bin/ Symlinks to executables | |_lib/ Symlinks to shared objects |_Howto.cook@ -> ../build/cook/include/main |_src/ |_DO/variant/platform |_lib/ | |_DO/variant/platform | |_a-m/ | | |_DO/variant/platform | | |_liba/ | | | |_DO/variant/platform | | | |_Howto.cook "make archive;" | | | |_class1.h | | | |_class1.c | | | |_class2.h | | | |_class2.c | | | | ... | | | |_Test/ | | | |_DO/variant/platform | | | |_Howto.cook "make test;" | | | |_test1.h | | | |_test1.c | | | |_test2.h | | | |_test2.c | | | ... | | | | | |_libb/ | | | |_DO/variant/platform | | |_Howto.cook "make archive;" | | ... | |_n-z/ | |_DO/variant/platform | |_libn/ | | |_DO/variant/platform | | |_Howto.cook "make archive;" | | ... | |_libo/ | |_DO/variant/platform | |_Howto.cook "make archive;" | ... |_bin |_DO/variant/platform |_utils | |_DO/variant/platform | |_Howto.cook "make simplex;" | |_util1.c | |_util2.c | ... |_bigprog |_DO/variant/platform |_Howto.cook "make complex;" |_class1.h |_class1.c |_class2.h |_class2.c | ... |_Test/ |_DO/variant/platform |_Howto.cook "make test;" |_test1.h |_test1.c |_test2.h |_test2.c ...
Note that even though top level targets may not refer to files outside of their containing directories, they may be nested. Any source files belonging to the nested top level targets no longer belong to the containing top level target. In the source tree example above, the top level target src/bin/bigprog does not own any files in src/bin/bigprog/Test.
Also note that every directory relevant to the build system has a DO subtree. Directories without a DO subtree appropriate for the selected variant and platform are ignored.
Note the runtime subtree at the top. This runtime tree is intended to be the root of a symlink farm which maps the shape of the production runtime tree to the existing development tree.
Finally, note that the example above is an example. The build system has no requirement that the source tree be organized in this way. You can choose to have deeply nested targets. The only restrictions that exist are:
- Any particular compiled object ends up in exactly one top level target - in other words, top level target source trees do not overlap;
- If you include a header file from top level target X, you (or your dependents) will link against X.
What follows is a description of all top level target types:
Simple Executable
A simple executable is exactly one file which gets compiled and linked into one executable. A directory with a Howto.cook file containing the line "make simplex;" will cause every file in the directory and subdirectories to be compiled into one executable each.
Complex Executable
A complex executable is a collection of source files which get compiled and linked together into one executable. A directory with a Howto.cook file containing the line "make complex;" will cause all files except the one with the same basename of the directory to be aggregated into one archive (static library). The remaining file is assumed to contain the main() function and is then linked against that library and any other libraries it may depend on.
The static library produced in this process is available for external linking, allowing for example the creation of test suites which exercise code in the complex executable. To gain the full benefit of this method, it is recommended to keep the size of the file containing the main() function as small as possible, since the code therein is not available for testing.
Archive (Static Library)
This is just your standard static library. A directory with a Howto.cook file containing the line "make archive;" will cause all files to be compiled and aggregated into a static library, available for external linking.
Shared Object
This is just like a library, except for being loaded at runtime instead of statically copied at link time. A directory with a Howto.cook file containing the line "make dump_so;" will cause all files to be compiled and aggregated into a shared object, available for external linking.
Loadable Module
A loadable module is a shared object which has all of its dependencies registered, so that they can be resolved at runtime. Effectively, the loadable module is linked like an executable, but behaves like a shared object. The loading program is no longer responsible for knowing about the dependencies of the object being loaded. The Howto.cook file in the directory of a loadable module must contain the line "make smart_so;".
Test Suite
A test suite is a collection of source files defining methods all called using a well-defined convention. A directory with a Howto.cook file containing the line "make test;" will cause a source file containing a main() function calling all of the test methods to be generated, compiled and and linked together with all of the test methods and their dependencies.
Third Party Build
A directory containing a Howto.cook file with the line "make sub_build;" will cause a hand-off to a third party build system. The same directory must contain a build.sh shell script which executes the handoff.
When the build system processes a sub_build top level target, all of the sources which belong to the sub_build target are incrementally copied into the DO subtree, then the current working directory is set to the DO subtree and the build.sh script is executed.
All the stdout output produced by the sub_build is stored in a log file. This log file is also the actual product of the sub_build, and any dependencies between the third party build and any other component of the source tree must be expressed as a dependency to or from that logfile.
Running Cook
Cook is always invoked at the top level of the source tree. This is the highest location containing a Howto.cook file, which is termed as the top level Howto.cook file. Note that all the paths used within the Howto.cook files are relative paths, rooted at this top level.
In order to avoid having to constantly change directories, a wrapper script called "b" is provided which will do the switching for you, while redirecting the build log to the current directory and passing in the appropriate command line options to limit the build to the targets in your current directory.
In addition to the standard cook options which can be listed by saying cook --help, the following options are peculiar to this build system:
- top=location
- Limit the build to targets contained in the specified location. The "b" wrapper uses this to pass in the current location.
- variant=variant
- Build specified variant. The default variant is debug.
- fast=y
- Assume all the dependency files are up to date and skip the verification.
- skip=y
- Skip the compilation of any files outside of the directories specified in top=location, usually the current directory and its subdirectories. This may cause inconsistencies, but this option is provided for developers who claim to know when a rebuild of files outside the current directory is not required.
In addition, any object file in the current directory may be created directly by specifying it as a build target.
Phases
Autodiscovery is actually quite complex, since it needs to bootstrap itself from a simple, top level invocation. In addition, the autodiscovery process needs to be as efficient as possible. This efficiency is achieved mainly by splitting the information into many many small files and organizing the generation of these files using the same techniques as are used doing the build proper. This means that files are only rebuilt when some of their sources changed.
What follows is a detailed description of the autodiscovery phases.
Initialization
The top level Howto.cook file is essentially a skeleton. The meat is provided by various include files which live under build/cook/include. The top level Howto.cook file will read in all of the include files in the following directories, ignoring files which look like editor backup files:
- build/cook/include/functions
- Files in this directory define various cook functions used in various places. The standard is one file per function, the file name being the same as the function name.
- build/cook/include/defs
- Files in this directory define which operations actually exist and what their command line syntax and options are. They also define the default values for these options.
- build/cook/include/collectors
- Files in this directory contain code which maps the set of source files to the set of object files and dependency files.
- build/cook/include/make
- Files in this directory define the supported top level targets. For every top level target, functions which group the set of object files and assign them to a target are defined, as are various parameters governing the creation of link dependencies.
- build/cook/include/use
- Files in this directory define various canned settings for using third party tools or for building special kinds of objects. The values therein will override the defaults defined in build/cook/include/defs.
- build/cook/include/phase_n
- Files in these directories contain cook recipes which define how files are compiled and linked. These files are included incrementally as the build progresses through its various phases.
All build rules have dependencies to the files containing definitions used by them. Changing a definition file will result in all DO's produced by rules using these definitions to be rebuilt.
Source File Collection (Phase 1)
This is the only recursive part of the build. The end result of this phase is that the build system has a list of source files. This list will only include files that are:
- under version control (if this feature is turned on);
- in a directory that contains a DO subtree which is valid for the current variant/platform combination.
The DO tree is itself under version control and looks like this: DO/variant/platform. This tree not only serves as a repository for DO's, but mainly serves to make selective parts of the source tree visible or invisible to specific variant/platform combinations. Use this instead of #ifdef's.
Local Customizations (Phase 2)
Any non-DO directory in the source tree may contain a Howto.cook file. This file contains definitions that are only valid in that directory and all of its subdirectories. The customization phase translates those Howto.cook files in order to localize all variable assignments and to parse references to modules. All DO's built in that directory and its subdirectories have a dependency to that file, so changing that file will result in those DO's being rebuilt.
A local Howto.cook file may contain any valid cook rule or statement, although care must be taken to remember that all paths are rooted from the position of the top level Howto.cook file. The following constructs have special meaning and are parsed and expanded by the customization process:
- variable += value;
- This creates a localized variable, with value appended to the current value of this variable from a higher level Howto.cook file, or to the default value as defined in build/cook/include/defs. Note that the whole assignment must be in one line. The parser is somewhat stupid about this.
- variable := value;
- This is like the += construct, except that the value replaces the previous value instead of being appended.
- make top level target;
- This causes all source files in this directory and its subdirectories to be associated with the specified top level target. If a subdirectory contains a Howto.cook file with its own make statement, then the files associated with that subtree are removed from the top level target of the containing directory. This allows one to have nested top level targets. The definitions associated with top level targets are stored in build/cook/include/make.
- use module;
- Use the canned settings defined in the specified module. Modules are stored in build/cook/include/use and allow a central definitions of all the flags and settings required to link to third party objects or rarely used system libraries.
Include Dependencies (Phase 3)
This is the grand-child of makedepend. It is extremly fast, but relies on some heuristics that need to be understood in order to maintain integrety of the incremental builds.
Every C/C++ source file is scanned for #include directives, and a so-called "cascade dependency" is generated for every file referenced in a #include, provided that it exists in the source tree.
A cascade dependency is a cook speciality, essentially a rule that says: "If you need file X, you will also need file Y." In other words, as soon as a dependency "A: X" is detected, an "A: Y" dependency is automatically added. Cascade dependencies allow for a flat scanning of header files, without requiring to actually include the referenced files, which would be very wasteful, as many header files would be parsed over and over again.
There are two problems with this flat scanning method:
- What if the #include was #ifdef'ed out (i.e. it wouldn't be really included)?
- What if the included header file had to be generated?
The first problem is solved essentially by ignoring it. Should a #include reference a file that exists, then the additional dependency can't really hurt. It may cause a DO to be rebuilt unnecessarily, but that's it. If the referenced header file doesn't exist, then no dependency is generated, and either the reference is actually #ifdef'ed out, or there will be a compiler error, which is just as well.
The case of generated header files is more difficult and will be addressed in detail in the code generator section. Essentially, the problem is solved by creating a so-called docking header file that always exists and that will include the generated header file. This way, a dependency will exist and everything will be fine again.
Library Dependencies (Phase 4)
Once all the header file dependencies are known, these dependencies are matched with the known locations of the top level targets, and appropriate cascade dependencies (if you use libA, you need libB) and real dependencies (if LibA changes, relink) are created.
Program Dependencies (Phase 5)
Program dependencies on libraries are generated by creating the transitive closure of all library dependencies of the libraries used by that program. A simple topological sort of all dependencies will yield a linear search order for libraries that can be used to link and produce the executable.
Since a topological sort is used, it is essential that there be no cyclic dependencies between libraries. There shouldn't be any. If a cycle is encountered, it should be broken up, either by isolating the cycle and putting it into a separate library, or by merging the mutually dependent libraries into one big library.
Build (Phase 6)
At this stage, all the dependencies are known and the build proper can proceed.
Inspecting and Debugging the Build
When creating a DO, cook will also create two files which help inspecting and debugging the build. One file contains the list of all files required as ingredients for building the file in question. This file has the same name as the DO, with a ".need" extension tacked on to it. The other file lists those files which caused the DO to be rebuilt. This one has the ".why" extension appended to the DO name.
When operating under ClearCase, you can recursively track down the .need or the .why files using the ct-catcr script. This script will dump out the version extended names of all the source files which cobstributed towards the DO. By keeping a mapping between this output and the DO, one can reliably track whether a DO has changed between builds and produce patches containing only changed DOs.
Customizing the Build System
Every shop is different, so customizations are inevitable. The system is designed to be easily adaptable.
File Map of the Build System
The layout of the build system, as it is made available in the download section, is as follows:
build/
|_cook/
|_bin/platform/ Pre-built cook binaries (Currently only Linux)
|_usage
|_version
|_helper/ Helper scripts, mostly in perl;
|_include/
|_main Intended target for the the top level Howto.cook
| symlinks in the actual development trees;
|_suffixes/platform File suffixes valid on platform;
|_paths/platform Paths to system commands and executables valid
| on platform;
|_defs/ Definitions of customizable, language specific
| variables (e.g. compiler flags);
|_use/ Module definitions;
|_make/ Top-Level target definitions;
|_rules/ Actual cook rules.
Modules
Autodiscovery can only go so far. There are too many third party objects that all have their own idiosyncracies and conventions, and autodiscovering which third party tool is used is not practical or even feasable in general.
The module construct exists to tell the build system how to use a third party tool. If a module exists, a developer can easily create a reference to a third party tool or library by inserting the following line in the Howto.cook file that lives in the directory containing the code requiring the module:
use module;
Modules consist of defining variables prefixed by the module name. The following variables may be used, and any value specified here will be appended to the defaults.
- module_cc_D_flags
- -D flags passed to the C compiler. Must be specified as "'-DNAME=value'".
- module_cc_I_flags
- -I flags passed to the C compiler. Must be specified as -Ipath/to/someplace.
- module_cc_flags
- All other flags passed to the C compiler
- module_cpp_D_flags
- -D flags passed to the C++ compiler. Must be specified as "'-DNAME=value'".
- module_cpp_I_flags
- -I flags passed to the C++ compiler. Must be specified as -Ipath/to/someplace.
- module_cpp_flags
- All other flags passed to the C++ compiler
- module_ld_l_flags
- -l flags passed to the linker. Must be specifie as -llib.
- module_ld_L_flags
- -L flags passed to the linker. Must be specifie as -Lpath/to/someplace.
- module_ld_flags
- All other linker flags.
- module_esql_I_flags
- -I flags passed to the embedded SQL preprocessor. Must be specified as -Ipath/to/someplace.
- module_external_obj
- List of full paths to single external object files that must be linked into the executable.
- module_external_lib
- List of full paths to single libraries that must be linked into the executable.
The following variables override the defaults:
- module_esql
- Name of the embedded SQL preprocessor.
- module_pre_esql
- Commands or environment variable definitions that must be prepended to the invocation of the embedded SQL preprocessor.
- module_ld
- Name of the linker to be used.
- module_ar
- Name and flags of the archiver to be used. This should be used to control the creation of dynamic vs. static libraries
- module_lib_suffix
- Extension of the archive created by module_ar.
As new ways of building things are added, more variables may be defined. These definitions, together with their default values, reside in build/cook/include/defs
These definitions may reside in several places, and are read in this order:
- In the top level Howto.cook file, for local, non-generic tweaks of a build.
- In build/cook/use/, for reusable tweaks, e.g. to link against a commonly used third party product. The easiest way to do this is to copy an existing module and modify it to support your new method.
- In build/cook/make/, for tweaks required to generate a new kind of top level build target, e.g perl XS modules. This usually requires deeper changes like adding new rules and collecting dependencies. It may be easiest to simply hand off by using the sub_build target
Definitions for a specific module should reside in a file that has the same name as the module, but don't have to. If they do, a dependency to that file will be generated so that a change to that file will cause all targets using that module to be rebuilt.
How to Support New Platforms
To support a new platform, first edit the get_platform script in the build/cook/helper directory, ensuring that it returns a valid and unique platform string. This string is used as the platform directory name for platform specific directories (e.g. DO subtrees) in various locations, which will need to be created as needed.
Next, copy and edit the files in build/cook/include/suffixes and build/cook/include/paths, ensuring that all the sufix and path definitions are valid for the new platform.
Finally, any platform specific language definitions located in the build/cook/include/defs directory need to be adapted.
How to Use New Languages
How to Define New Top-Level Build Targets
Code Generators
Currently, there is support for lex, yacc and various embedded SQL preprocessors. Source files for these preprocessors reside in the same place as C/C++ files, and the preprocessors are invoked prior to the include dependency phase.
There is one aspect that requires a little bit of care: generated header files. Since include dependencies to non-existing files are ignored, and since generated header files live in the DO subtree, a so-called docking header file must be use to access the generated header file. A docking header file looks like this:
#ifndef _generated_header_h_ #define _generated_header_h_ /* The following macro "stringifies" a constant. This allows constants to be * passed in via compiler flags without using fancy quoting mechanisms. */ /* Go ask the ANSI committee why you need two levels of indirection here */ #ifndef cook_str #define cook_str(s) # s #define cook_string(s) cook_str(s) #endif /* Now include the generated header file */ #include cook_string(OBJDIR/generated_header.h) #endif
The compiler invocation includes a -DOBJDIR=DO/variant/platform flag. This works most of the time, but may require you to #undef any macro that may be part of the OBJDIR string, as it will be faithfully but incorrectly resolved. The common ones are AIX, sparc, i386 ...
The name of the docking header file must be identical to the name of the generated header file, as a dependency is created automatically.
