Add support for JSON compilation database generation and maintenance

See the "Compilation Database" section in the "cc Module" chapter of the manual for details.
author: Boris Kolpackov <boris@codesynthesis.com> 2024-08-28 09:36:16 +0200
committer: Boris Kolpackov <boris@codesynthesis.com> 2024-10-09 10:06:21 +0200
commit: eeb155ebc35c5947234f731c333e2bd71ea88974 (patch)
tree: d2784e072b1770b3d30587f97eb4b72b7ef3e765 /doc/manual.cli
parent: 8384a087afc7e29e900a3ce96d55ab2f5c2a74c2 (diff)
1 files changed, 319 insertions, 9 deletions
diff --git a/doc/manual.cli b/doc/manual.cli
index 07d816a..03fa04a 100644
--- a/doc/manual.cli
+++ b/doc/manual.cli
@@ -2093,7 +2093,7 @@ If we forget to adjust the \c{missing-name} test, then this is what we could
 expect to see when running the tests:
 
 \
-b test
+$ b test
 c++ hello/cxx{hello} -> hello/obje{hello}
 ld hello/exe{hello}
 test hello/exe{hello} + hello/testscript{testscript}
@@ -6700,8 +6700,8 @@ quickly re-run a previously failed test), it can also be persisted in
 subset of tests by default. For example:
 
 \
-b test config.test=foo/exe{driver} # Only test foo/exe{driver} target.
-b test config.test=bar/baz         # Only run bar/baz testscript test.
+$ b test config.test=foo/exe{driver} # Only test foo/exe{driver} target.
+$ b test config.test=bar/baz         # Only run bar/baz testscript test.
 \
 
 The \c{config.test} variable contains a list of \c{@}-separated pairs with the
@@ -6712,14 +6712,14 @@ name. Otherwise \- an id path. The targets are resolved relative to the root
 scope where the \c{config.test} value is set. For example:
 
 \
-b test config.test=foo/exe{driver}@bar
+$ b test config.test=foo/exe{driver}@bar
 \
 
 To specify multiple id paths for the same target we can use the pair
 generation syntax:
 
 \
-b test config.test=foo/exe{driver}@{bar baz}
+$ b test config.test=foo/exe{driver}@{bar baz}
 \
 
 If no targets are specified (only id paths), then all the targets are tested
@@ -6741,9 +6741,9 @@ and the right hand side \- for individual tests. The zero value clears the
 previously set timeout. For example:
 
 \
-b test config.test.timeout=20   # Test operation.
-b test config.test.timeout=20/5 # Test operation and individual tests.
-b test config.test.timeout=/5   # Individual tests.
+$ b test config.test.timeout=20   # Test operation.
+$ b test config.test.timeout=20/5 # Test operation and individual tests.
+$ b test config.test.timeout=/5   # Individual tests.
 \
 
 The test timeout can be specified on multiple nested root scopes. For example,
@@ -6759,7 +6759,7 @@ specifying the \c{config.test.runner} variable. Its value has the \c{<path>
 [<options>]} form. For example:
 
 \
-b test config.test.runner=\"valgrind -q\"
+$ b test config.test.runner=\"valgrind -q\"
 \
 
 When the runner program is specified, commands of simple and Testscript tests
@@ -7648,6 +7648,12 @@ config.cc.reprocess
   cc.reprocess
 
 config.cc.pkgconfig.sysroot
+
+config.cc.compiledb
+config.cc.compiledb.name
+config.cc.compiledb.filter
+config.cc.compiledb.filter.input
+config.cc.compiledb.filter.output
 \
 
 Note that the compiler mode options are \"cross-hinted\" between \c{config.c}
@@ -8054,6 +8060,310 @@ As a result, it should only be used for dealing with issues in third-party
 installation} should be used instead.|
 
 
+\h#cc-compiledb|Compilation Database|
+
+The \c{cc}-based modules provide support for generating and maintaining the
+\l{https://clang.llvm.org/docs/JSONCompilationDatabase.html JSON Compilation
+Database} which can be used by other tools (static analyzers, language
+servers, IDEs, etc) to understand how a codebase is compiled. \"Maintaining\"
+in the previous sentence means that if new source files get added to the
+project or old ones removed, or if any compilation options change, then the
+corresponding entries in the compilation database will be automatically
+updated when you update your project. This helps maintain the database in sync
+with the project state.
+
+The generation of compilation databases and their configuration are controlled
+with a number of \c{config.cc.compiledb.*} variables. The
+\c{config.cc.compiledb} variable provides a simplified interface that enables
+the generation of one database per project with the resulting database
+containing entries for all the source and object files. The rest of the
+variables provide a more flexible interface that allows you to generate
+multiple databases in different locations as well as filter the entries that
+end up in each database.
+
+Let's start with the simplified interface as provided by
+\c{config.cc.compiledb}. The value of this configuration variable is a single
+\ci{name} or a \ci{name} and \ci{path} pair in the \c{\i{name}[@\i{path}]}
+form.
+
+The \ci{name} part is the compilation database name that can be used to refer
+to it in filters (see below). If \ci{path} is absent or is (syntactically) a
+directory, then \ci{name} is also used to derive the compilation database file
+by appending the \c{.json} extension to it.
+
+If \ci{path} is absent, then the compilation database is placed into the
+top-level amalgamation that loads any \c{cc}-based module. Otherwise, the
+database is placed into the specified location.
+
+The special \c{-} name is interpreted as an instruction to dump the database
+to \c{stdout}.
+
+Let's see some examples of using \c{config.cc.compiledb} to handle a few
+common scenarios. Here we will use \l{bdep(1)} to create amalgamations
+(configurations) and configure (initialize) one or more projects. We will
+assume we have \c{hello} and \c{libhello} as if created like this:
+
+\
+$ bdep new -t exe hello
+$ bdep new -t lib libhello
+\
+
+The most common scenario is likely having a compilation database per
+project:
+
+\
+$ cd libhello
+$ bdep config create ../build-gcc @gcc cc config.cxx=g++
+$ bdep init @gcc config.cc.compiledb=libhello
+$ cd ..
+
+$ cd hello
+$ bdep config add ../build-gcc @gcc
+$ bdep init @gcc config.cc.compiledb=hello
+$ cd ..
+
+$ b hello/ libhello/
+\
+
+\N|Or if you prefer to create/add configuration as part of \c{init} (notice
+the \c{--} separator):
+
+\
+$ bdep init -C ../build-gcc @gcc cc config.cxx=g++ -- \\
+  config.cc.compiledb=libhello
+
+$ bdep init -A ../build-gcc @gcc config.cc.compiledb=hello
+\
+
+|
+
+After the update (the last command), we will have \c{hello.json} and
+\c{libhello.json} in \c{build-gcc/} which contain the compilation command
+lines for each project.
+
+\N|Only source files that are compiled end up being added to the compilation
+database.
+
+To illustrate this point, let's assume our \c{hello} project imports and links
+\c{libhello}. And instead of updating both as in the above example, we will
+first update only \c{hello}:
+
+\
+$ b hello/
+\
+
+In this case \c{libhello.json} will still be generated but it will only
+contain a subset of the expected entries \- only those that were caused to be
+compiled by \c{hello}. The missing entries can be added by updating
+\c{libhello}:
+
+\
+$ b libhello/
+\
+
+|
+
+In the above setup it feels natural to call each database after the project
+and place them into the output directory. However, some consumers, such as
+IDEs, may not handle this setup well. Specifically, they may only recognize
+the canonical \c{compile_commands.json} file as the compilation database,
+opening all other files as generic JSON. They may also assume the directory
+where this file resides to be the project source directory root. To accommodate
+these assumptions we can instead place each database into the project's
+source directory and call it \c{compile_commands.json}:
+
+\
+$ bdep init @gcc config.cc.compiledb=libhello@./compile_commands.json
+
+$ bdep init @gcc config.cc.compiledb=hello@./compile_commands.json
+\
+
+Note that in this case it will be your responsibility to remove the database
+files if and when necessary. \N{\l{bdep-new(1)} adds \c{compile_commands.json}
+to \c{.gitignore} it generates.}
+
+If instead of having a separate database for each project we wanted to place
+all the entries into a single database, then the relevant commands would
+change as follows:
+
+\
+$ bdep init @gcc config.cc.compiledb=compiledb
+
+$ bdep init @gcc config.cc.compiledb=compiledb
+\
+
+This would give us a single \c{build-gcc/compiledb.json} that contains the
+compilation command lines for both projects.
+
+In the above example only \c{hello} and \c{libhello} will end up in the
+database, but not any of their dependencies. What if we wanted entries for
+everything in \c{build-gcc/}? In this case, we should enable the compilation
+database for the entire configuration rather than for individual projects:
+
+\
+$ bdep config create ../build-gcc @gcc cc \\
+  config.cxx=g++                          \\
+  config.cc.compiledb=compiledb
+$ bdep init @gcc
+
+$ bdep config add ../build-gcc @gcc
+$ bdep init @gcc
+\
+
+If multiple linked configurations are involved, then we would often want
+projects initialized in different configurations share the compilation
+database. The representative scenario here is a tool, such as a source code
+generator, which is initialized in the host configuration, and its runtime
+library plus tests/examples, which are initialized in the target
+configuration. Let's assume that in our example \c{hello} is the tool and
+\c{libhello} is the runtime library and both are part of the same project.
+This is how we can arrange for them to share the compilation database:
+
+\
+$ bdep config create @host ../host-gcc --type host cc config.cxx=g++
+$ bdep config create @target ../build-gcc cc config.cxx=g++
+
+$ bdep init @host -d hello config.cc.compiledb=hello@../build-gcc/
+$ bdep init @target -d libhello config.cc.compiledb=hello
+
+$ bdep update @host @target
+\
+
+With this setup the \c{hello.json} database in \c{build-gcc/} will contain
+entries for both \c{hello} and \c{libhello}.
+
+If instead of configuring and maintaining the compilation database in a file
+you want to dump it somewhere once, the recommended approach is to write it
+to \c{stdout}. For example:
+
+\
+$ b -n hello/ libhello/ config.cc.compiledb=- >/tmp/compiledb.json
+\
+
+Note that writing to \c{stdout} forces recompilation of all the targets that
+would be updated in order to make sure their entries end up in the database.
+If you don't want the actual recompilation, then you can use the dry run mode
+(\c{-n} option above).
+
+\N|If your projects are spread across multiple linked configurations and you
+would like to get compilation command lines for all of them, then use the
+global override for \c{config.cc.compiledb}:
+
+\
+$ b '!config.cc.compiledb=-' ...
+\
+
+As mentioned earlier, the entries that will end up in such a database are
+determined by what gets updated.|
+
+Let's now turn to the rest of the \c{config.cc.compiledb.*} configuration
+variables that provide a lower-level but more flexible interface. The
+following listing shows their synopsis:
+
+\
+config.cc.compiledb.name           =  <name>[@<path>]...
+config.cc.compiledb.filter         =  [<name>@]<bool>...
+config.cc.compiledb.filter.input   =  [<name>@]<target-type>...
+config.cc.compiledb.filter.output  =  [<name>@]<target-type>...
+\
+
+The \c{config.cc.compiledb.name} variable specifies the name and location of
+one or more compilation databases. The semantics of the
+\c{\i{name}[@\i{path}]} pair is the same as in \c{config.cc.compiledb}
+discussed above, except that if \ci{path} is absent, then the database is
+placed into the project being configured rather than into the top-level
+amalgamation.
+
+Also, unlike \c{config.cc.compiledb}, this variable does not automatically
+enable writing to the specified databases. Instead, this is the job of
+\c{config.cc.compiledb.filter}. Splitting this logic into two steps allows us
+to configure the database name/location in one place, typically an outer
+amalgamation, and then enable writing to it in other places, typically
+specific subprojects.
+
+The \c{config.cc.compiledb.filter.{input,output\}} variables allow us to
+filter the entries that end up in the databases based on the input (\c{c{\}},
+\c{cxx{\}}, etc) and output (\c{obja{\}}, \c{objs{\}}, etc) target types.
+
+Note that in all three \c{.filter} variables the values are examined in the
+reverse order and the first entry that matches determines the outcome.
+Entries without \ci{name} apply to all databases and the target types are
+matched taking into account inheritance (so \c{target{\}} will match any type)
+and groups (so \c{obj{\}} will match any \c{obj[eas]{\}}). If no target type
+filter (input or output) is specified, then no corresponding target filtering
+is performed.
+
+\N|The \c{config.cc.compiledb=<name>} semantics can be expressed as the
+following set of lower-level variables:
+
+\
+config.cc.compiledb.name           = <name>@../path/to/amalgamation/
+config.cc.compiledb.filter        += <name>@true
+config.cc.compiledb.filter.input  += <name>@target
+config.cc.compiledb.filter.output += <name>@target
+\
+
+The last three assignments only apply if the corresponding variable is not set
+to a custom value for this project.|
+
+Let's look at a few examples of using these lower-level configuration
+variables. The common use for the output target filtering is getting rid of
+\c{obja{\}} or \c{objs{\}} entries in libraries. Unless configured otherwise,
+when we build a library we end up with both static and shared variants. And
+this means that each source file for the library is compiled twice, once to
+produce \c{obja{\}} that goes to the static library and once -- \c{objs{\}}.
+And that, in turn, means that we will end up with two compilation database
+entries for each such source file. If we don't want that for some reason (for
+instance, because the consumer of the database does not handle this well),
+then we can filter one of them out. For example, below is how we can
+initialize \c{libhello} to achieve this (notice that we also include
+\c{obje{\}} to keep object files for executables, such as tests):
+
+\
+$ bdep init @gcc               \\
+  config.cc.compiledb=libhello \\
+  config.cc.compiledb.filter.output='obje objs'
+\
+
+As an example of the input target type filtering, below is how we can keep
+entries only for the C and C++ source files, filtering out everything else
+(assembler, Objective-C/C++), for instance, because the consumer of our
+database does not recognize them:
+
+\
+$ bdep init @gcc               \\
+  config.cc.compiledb=libhello \\
+  config.cc.compiledb.filter.input='c cxx'
+\
+
+As an example of a more advanced configuration, consider a compilation
+database for a project that use C++ modules. To know how such a project is
+compiled we not only need to know how its own source files are compiled, but
+also how to compile all the module interfaces that it consumes, including from
+other projects, transitively. One way to set this up would be to enable
+writing entries of the \c{bmi{\}} output target type to any database in the
+amalgamation:
+
+\
+$ bdep config create ../build-gcc @gcc cc \\
+  config.cxx=g++                          \\
+  config.cc.compiledb.filter=true         \\
+  config.cc.compiledb.filter.output=bmi   \\
+
+
+$ bdep init @gcc config.cc.compiledb=libhello
+
+$ bdep init @gcc config.cc.compiledb=hello
+\
+
+With this setup \c{libhello.json} and \c{hello.json} will contain module
+interface entries from all the dependencies.
+
+\N|When debugging complex compilation database setups it can be helpful to
+increase diagnostics verbosity to level 6 in order to get a trace of filtering
+decisions (the relevant lines will contain the \c{compiledb} keyword).|
+
+
 \h#cc-gcc|GCC Compiler Toolchain|
 
 The GCC compiler id is \c{gcc}.
author	Boris Kolpackov <boris@codesynthesis.com>	2024-08-28 09:36:16 +0200
committer	Boris Kolpackov <boris@codesynthesis.com>	2024-10-09 10:06:21 +0200
commit	eeb155ebc35c5947234f731c333e2bd71ea88974 (patch)
tree	d2784e072b1770b3d30587f97eb4b72b7ef3e765 /doc/manual.cli
parent	8384a087afc7e29e900a3ce96d55ab2f5c2a74c2 (diff)