aggressive ninja | cache missing

motivation

ninja is a build system that is designed to run build as fast as possible. It’s default concurrency jobs is the number of available threads. It’s the most widely used build system in our company. And we have another “build system framework” that is build on top of ninja and conan

In this framework, we manage the dependencies’s version in a configuration file, every module(library or project) is a node in the tree, and module use conan to manage it’s dependencies. we only need to specify the dependencies name, the version is managed by framework’s configuration . before building a module, the framework cli will generate another conanfile(which have the version of dependencies) base on the conanfile and the configuration file. then this generated conanfile will be used to install the dependencies. some dependencies are not pre-built, so we need to build them from source code.

Now here is the question. the build system framework have a hard coded concurrency strategy, it will based on the free memory at the beginning of the build process to decide the concurrency jobs. Because our projects are very large, both the link stage and the compile stage will consume a lot of memory. So the OOM is a common issue in our build process. Our build system framework is also running on the CI/CD server, if the building process is OOM, the developer need to re-run the build process, which is very annoying. So the toolchains team bring in the conservative concurrency strategy

this strategy leads to a lot of idle time in the building machine, and the developer always complain about the slow building speed, the develop and debug cycle is very long. This is the motivation of this project. I want to patch the ninja to support a better concurrency strategy, and ignore the jobs argument specified by the build system framework.

design

In order to collect system resource usage, I create a new thread running in the background. I rewrite the ninja main building loop , if the cpu usage is lower than some threshold, I will increase the concurrency jobs, otherwise I will decrease the concurrency jobs (not create new job until some jobs are finished and the cpu usage is lower than the threshold). And the system load is also considered in the algorithm.

If building or linking process was been killed by the OOM killer, it will be restart later. I implement this by a very simple approach, just check the stderr of the process. For example , when c++ compiler is killed by the OOM killer, the stderr will contain Killed signal terminated program cc1plus I also record the maxrss for every job, and store it in rocksdb, so i can use this information to optimize the running strategy, for example, i can pick the job with the lowest maxrss to run first.

Another feature is auto PCH, the aggressive ninja can generate PCH for every source file, and use it in the compile process.