// file : doc/manual.cli // copyright : Copyright (c) 2014-2019 Code Synthesis Ltd // license : MIT; see accompanying LICENSE file "\name=build2-build-bot-manual" "\subject=build bot" "\title=Build Bot" // NOTES // // - Maximum
 line is 70 characters.
//

"
\h0#preface|Preface|

This document describes \c{bbot}, the \c{build2} build bot. For the build bot
command line interface refer to the \l{bbot-agent(1)} and \l{bbot-worker(1)}
man pages.

\h1#intro|Introduction|

\h1#arch|Architecture|

The \c{bbot} architecture includes several layers for security and
manageability. At the top we have a \c{bbot} running in the \i{controller}
mode. The controller monitors various \i{build sources} for \i{build
tasks}. For example, a controller may poll a \c{brep} instances for any new
packages to built as well as monitor a \cb{git} repository for any new commits
to test. There can be several layers of controllers with \c{brep} being just a
special kind. A machine running a \c{bbot} instance in the controller mode is
called a \i{controller host}.

Below the controllers we have a \c{bbot} running in the \i{agent} mode
normally on Build OS. The agent polls its controllers for \i{build tasks} to
perform. A machine running a \c{bbot} instance in the agent mode is called a
\i{build host}.

The actual building is performed in the virtual machines and/or containers
that are executed on the build host. Inside virtual machines/containers,
\c{bbot} is running in the \i{worker mode} and receives build tasks from its
agent. Virtual machines and containers running a \c{bbot} instance in the
worker mode are collectively called \i{build machines}.

Let's now examine the workflow in the other direction, that is, from a worker
to a controller. Once a build machine is booted (by the agent), the worker
inside connects to the TFTP server running on the build host and downloads the
\i{build task manifest}. It then proceeds to perform the build task and
uploads the \i{build result manifest} (which includes build logs) to the TFTP
server.

Once an agent receives a build task for a specific build machine, it goes
through the following steps. First, it creates a directory on its TFTP server
with the \i{machine name} as its name and places the build task manifest
inside. Next, it makes a throw-away snapshot of the build machine and boots
it. After booting the build machine, the agent monitors the machine directory
on its TFTP server for the build result manifest (uploaded by the worker once
the build has completed). Once the result manifest is obtained, the agent
shuts down the build machine and discards its snapshot.

To obtains a build task the agent polls via HTTP/HTTPS one or more
controllers. Before each poll request the agent enumerates the available build
machines and sends this information as part of the request. The controller
responds with a build task manifest that identifies a specific build machine
to use.

If the controller has higher-level controllers (for example, \c{brep}), then
it aggregates the available build machines from its agents and polls these
controllers (just as an agent would), forwarding build tasks to suitable
agents. In this case we say that the \i{controller act as an agent}. The
controller may also be configured to monitor build sources, such as SCM
repositories, directly in which case it generates build tasks itself.

In this architecture the build results are propagated up the chain: from a
worker, to its agent, to its controller, and so on. A controller that is the
final destination of a build result uses email to notify interested parties of
the outcome. For example, \c{brep} would send a notification to the package
owner if the build failed. Similarly, a \c{bbot} controller that monitors a
\cb{git} repository would send an email to a committer if their commit caused a
build failure. The email would include a link (normally HTTP/HTTPS) to the
build logs hosted by the controller.

\h#arch-machine-config|Configurations|

The \c{bbot} architecture distinguishes between a \i{machine configuration}
and a \i{build configuration}. The machine configuration captures the
operating system, installed compiler toolchain, and so on. The same build
machine may be used to \"generate\" multiple \i{build configurations}. For
example, the same machine can normally be used to produce 32/64-bit and
debug/optimized builds.

The machine configuration is \i{approximately} encoded in its \i{machine
name}. The machine name is a list of components separated with \c{-}.
Components cannot be empty and must contain only alpha-numeric characters,
underscores, dots, and pluses with the whole id being a portably-valid path
component.

The encoding is approximate in a sense that it captures only what's important
to distinguish in a particular \c{bbot} deployment.

The first component normally identifies the operating system and has the
following recommended form:

\
[_][_][_]
\

For example:

\
windows
windows_10
windows_10.1607
i686_windows_xp
bsd_freebsd_10
linux_centos_6.2
linux_ubuntu_16.04
macos_10.12
\

The second component normally identifies the installed compiler toolchain and
has the following recommended form:

\
[][][]
\

For example:

\
gcc
gcc_6
gcc_6.3
gcc_6.3_mingw_w64
clang_3.9_libc++
clang_3.9_libstdc++
msvc_14
msvc_14u3
icc
\

Some examples of complete machine names:

\
windows_10-msvc_14u3
macos_10.12-clang_10.0
linux_ubuntu_16.04-gcc_6.3
\

Similarly, the build configuration is encoded in a \i{configuration name}
using the same format. As described in \l{#arch-controller Controller Logic},
build configurations are generated from machine configurations. As a result,
it usually makes sense to have the first component identify the operating
systems and the second component \- the toolchain with the rest identifying a
particular build configuration variant, for example, optimized, sanitized,
etc. For example:

\
windows-vc_14-O2
linux-gcc_6-O3_asan
\

\h#arch-machine-header|Machine Header Manifest|

@@ TODO: need ref to general manifest overview in bpkg, or, better yet,
move it to libbutl and ref to that from both places.

The build machine header manifest contains basic information about a build
machine on the build host. A list of machine header manifests is sent by
\c{bbot} agents to controllers. The manifest synopsis is presented next
followed by the detailed description of each value in subsequent sections.

\
id: 
name: 
summary: 
\

For example:

\
id: windows_10-msvc_14-1.3
name: windows_10-msvc_14
summary: Windows 10 build 1607 with VC 14 update 3
\

\h2#arch-machine-header-id|\c{id}|

\
id: 
\

The uniquely machine version/revision/build identifies. For virtual machines
this can be the disk image checksum. For a container this can be UUID that is
re-generated every time a container filesystem is altered.


\h2#arch-machine-header-name|\c{name}|

\
name: 
\

The machine name.


\h2#arch-machine-header-summary|\c{summary}|

\
summary: 
\

The one-line description of the machine.


\h#arch-machine|Machine Manifest|

The build machine manifest contains the complete description of a build
machine on the build host (see the Build OS documentation for their origin and
location). The machine manifest starts with the machine manifest header with
all the header values appearing before any non-header values. The non-header
part of manifest synopsis is presented next followed by the detailed
description of each value in subsequent sections.

\
type: kvm|nspawn
[mac]: 
[options]: 
[changes]: 
\


\h2#arch-machine-type|\c{type}|

\
type: kvm|nspawn
\

The machine type. Valid values are \c{kvm} (QEMU/KVM virtual machine) and
\c{nspawn} (\c{systemd-nspawn} container).


\h2#arch-machine-mac|\c{mac}|

\
[mac]: 
\

The fixed MAC address for the machine. Must be in the hexadecimal,
comma-separated format. For example:

\
mac: de:ad:be:ef:de:ad
\

If it is not specified, then a random address is generated on the first
machine bootstrap which is then reused for each build/re-bootstrap. Note that
if you specify a fixed address, then the machine can only be used by a single
\c{bbot} agent.


\h2#arch-machine-options|\c{options}|

\
[options]: 
\

The list of machine options. The exact semantics is machine type-dependent
(see below). A single level of quotes (either single or double) is removed in
each option before being passed on. Options can be separated with spaces or
newlines.

For \c{kvm} machines, if this value is present, then it replaces the default
network and disk configuration when starting the QEMU/KVM hypervisor. The
options are pre-processed by replacing the question mark in \c{ifname=?} and
\c{mac=?} strings with the network interface and MAC address, respectively.


\h2#arch-machine-changes|\c{changes}|

\
[changes]: 
\

The description of machine changes in this version.

Multiple \c{changes} values can be present which are all concatenated in the
order specified, that is, the first value is considered to be the most recent.
For example:

\
changes: 1.1: initial version
changes: 1.2: increased disk size to 30GB
\

Or:

\
changes:\
1.1
  - initial version

1.2
  - increased disk size to 30GB
  - upgraded bootstrap baseutils
\\
\


\h#arch-task|Task Manifest|

The task manifest describes a build task. It consists of two groups of values.
The first group defines the package to build. The second group defines the
build configuration to use for building the package. The manifest synopsis is
presented next followed by the detailed description of each value in
subsequent sections.

\
name: 
version: 
#location: 
repository-url: 
[repository-type]: pkg|git|dir
[trust]: 

machine: 
target: 
[environment]: 
[config]: 
[warning-regex]: 
\


\h2#arch-task-name|\c{name}|

\
name: 
\

The package name to build.


\h2#arch-task-version|\c{version}|

\
version: 
\

The package version to build.


\h2#arch-task-repository-url|\c{repository-url}|

\
repository-url: 
\

The URL of the repository that contains the package and its dependencies.


\h2#arch-task-repository-type|\c{repository-type}|

\
[repository-type]: pkg|git|dir
\

The repository type (see \c{repository-url} for details). Alternatively, the
repository type can be specified as part of the URL scheme. See
\l{bpkg-repository-types(1)} for details.

\h2#arch-task-trust|\c{trust}|

\
[trust]: 
\

The SHA256 repository certificate fingerprint to trust (see the \c{bpkg}
\c{--trust} option for details). This value may be specified multiple times to
establish the authenticity of multiple certificates. If the special \c{yes}
value is specified, then all repositories will be trusted without
authentication (see the \c{bpkg} \c{--trust-yes} option).

Note that while the controller may return a task with \c{trust} values,
whether they will be used is up to the agent's configuration. For example,
some agents may only trust their internally-specified fingerprints to prevent
the \"man in the middle\" attacks.


\h2#arch-task-machine|\c{machine}|

\
machine: 
\

The name of the build machine to use.


\h2#arch-task-target|\c{target}|

\
target: 
\

The target to build for.

Compared to the autotools terminology, the \c{machine} value corresponds to
\c{--build} (the machine we are building on) and \c{target} \- to \c{--host}
(the machine we are building for). While we use essentially the same \i{target
triplet} format as autotools for \c{target}, it is not flexible enough for
\c{machine}.


\h2#arch-task-environment|\c{environment}|

\
[environment]: 
\

The name of the build environment to use. See \l{#arch-worker Worker Logic}
for details.


\h2#arch-task-config|\c{config}|

\
[config]: 
\

The additional configuration options and variables. A single level of quotes
(either single or double) is removed in each value before being passed to
\c{bpkg}. For example, the following value:

\
config: config.cc.coptions=\"-O3 -stdlib='libc++'\"
\

Will be passed to \c{bpkg} as the following (single) argument:

\
config.cc.coptions=-O3 -stdlib='libc++'
\

Values can be separated with spaces or newlines. See \l{#arch-controller
Controller Logic} for details.


\h2#arch-task-warning-regex|\c{warning-regex}|

\
[warning-regex]: 
\

Additional regular expressions that should be used to detect warnings in the
build logs. Note that only the first 512 bytes of each log line is considered.

A single level of quotes (either single or double) is removed in each
expression before being used for search. For example, the following value:

\
warning-regex: \"warning C4\d{3}: \"
\

Will be treated as the following (single) regular expression (with a trailing
space):

\
warning C4\d{3}:
\

Expressions can be separated with spaces or newlines. They will be added to
the following default list of regular expressions that detect the \c{build2}
toolchain warnings:

\
^warning:
^.+: warning:
\

Note that this built-in list also covers GCC and Clang warnings (for the
English locale).


\h#arch-result|Result Manifest|

The result manifest describes a build result. The manifest synopsis is
presented next followed by the detailed description of each value in
subsequent sections.

\
name: 
version: 

status:                  
[configure-status]:      
[update-status]:         
[test-status]:           
[install-status]:        
[test-installed-status]: 
[uninstall-status]:      

[configure-log]:      
[update-log]:         
[test-log]:           
[install-log]:        
[test-installed-log]: 
[uninstall-log]:      
\


\h2#arch-result-name|\c{name}|

\
name: 
\

The package name from the task manifest.


\h2#arch-result-version|\c{version}|

\
version: 
\

The package version from the task manifest.


\h2#arch-result-status|\c{status}|

\
status: 
\

The overall (cumulative) build result status. Valid values are:

\
success    # All operations completed successfully.
warning    # One or more operations completed with warnings.
error      # One or more operations completed with errors.
abort      # One or more operations were aborted.
abnormal   # One or more operations terminated abnormally.
\

The \c{abort} status indicates that the operation has been aborted by
\c{bbot}, for example, because it was consuming too many resources and/or was
taking too long. Note that a task can be aborted both by the \c{bbot} worker
as well as the agent. In the later case the whole machine is shut down and no
operation-specific status or logs will be included (@@ Maybe we should just
include 'log:' with commands that start VM, for completeness?).

The \c{abnormal} status indicates that the operation has terminated
abnormally, for example, due to the package manager or build system crash.

Note that the overall \c{status} value should appear before any per-operation
\c{*-status} values.


\h2#arch-result-x-status|\c{*-status}|

\
[*-status]: 
\

The per-operation result status. Note that the \c{*-status} values should
appear in the same order as the corresponding operations were performed
and for each \c{*-status} there should be the corresponding \c{*-log}
value. Currently supported operation names:

\
configure
update
test
install
test-installed
uninstall
\


\h2#arch-result-x-log|\c{*-log}|

\
[*-log]: 
\

The per-operation result log. Note that the \c{*-log} values should appear
last and in the same order as the corresponding \c{*-status} values. For
the list of supported operation names refer to the \c{*-status} value
description.


\h#arch-task-req|Task Request Manifest|

An agent (or controller acting as an agent) sends a task request to its
controller via HTTP/HTTPS POST method (@@ URL/API endpoint). The task request
starts with the task request manifest followed by a list of machine manifests.
The task request manifest synopsis is presented next followed by the detailed
description of each value in subsequent sections.

\
agent: 
toolchain-name: 
toolchain-version: 
[fingerprint]: 
\


\h2#arch-task-req-agent|\c{agent}|

\
agent: 
\

The name of the agent host (\c{hostname}). The name should be unique in a
particular \c{bbot} deployment.


\h2#arch-task-req-toolchain-name|\c{toolchain-name}|

\
toolchain-name: 
\

The \c{build2} toolchain name being used by the agent.


\h2#arch-task-req-toolchain-version|\c{toolchain-version}|

\
toolchain-version: 
\

The \c{build2} toolchain version being used by the agent.


\h2#arch-task-req-fingerprint|\c{fingerprint}|

\
[fingerprint]: 
\

The SHA256 fingerprint of the agent's public key. An agent may be configured
not to use the public key-based authentication in which case it does not
include this value. However, the controller may be configured to require the
authentication in which case it should respond with the 401 (unauthorized)
HTTP status code.


\h#arch-task-res|Task Response Manifest|

A controller sends the task response manifest in response to the task request
initiated by an agent. The response is delivered as a result of the POST
method. The task response starts with the task response manifest optionally
followed by the task manifest. The task response manifest synopsis is
presented next followed by the detailed description of each value in
subsequent sections.

\
session: 
[challenge]: 
[result-url]: 
\


\h2#arch-task-res-session|\c{session}|

\
session: 
\

The identifier assigned to this session by the controller. An empty value
indicates that the controller has no tasks at this time in which case all the
following values as well as the task manifest are absent.


\h2#arch-task-res-challenge|\c{challenge}|

\
[challenge]: 
\

The random, 64 characters long string (nonce) used to challenge the agent's
private key. If present, then the agent must sign this string and include the
signature in the result request (see below).

The signature should be calculated by encrypting the string with the agent's
private key and then \c{base64}-encoding the result.


\h2#arch-task-res-result-url|\c{result-url}|

\
[result-url]: 
\

The URL to POST (upload) the result request to.


\h#arch-result-req|Result Request Manifest|

On completion of a task an agent (or controller acting as an agent) sends the
result (upload) request to the controller via the POST method using the URL
returned in the task response (see above). The result request starts with the
result request manifest followed by the result manifest. Note that there is no
result response and only a successful but empty POST result is returned. The
result request manifest synopsis is presented next followed by the detailed
description of each value in subsequent sections.

\
session: 
[challenge]: 
\


\h2#arch-result-req-session|\c{session}|

\
session: 
\

The session id as returned by the controller in the task response.


\h2#arch-result-req-challenge|\c{challenge}|

\
[challenge]: 
\

The answer to the private key challenge as posed by the controller in the task
response. It must be present only if the challenge value was present in the
task response.


\h#arch-worker|Worker Logic|

The \c{bbot} worker builds each package in a \i{build environment} that is
established for a particular build target. The environment has three
components: the execution environment (environment variables, etc), build
system modules, as well as configuration options and variables.

Setting up of the environment is performed by an executable (script, batch
file, etc). Specifically, upon receiving a build task, if it specifies the
environment name then the worker looks for the environment setup executable
with this name in a specific directory and for the executable called
\c{default} otherwise. Not being able to locate the environment executable is
an error.

Once the environment setup executable is determined, the worker re-executes
itself as that executable passing to it as command line arguments the target
name, the path to the \c{bbot} worker to be executed once the environment is
setup, and any additional options that need to be propagated to the re-executed
worker. The environment setup executable is executed in the build directory as
its current working directory. The build directory contains the build task
\c{task.manifest} file.

The environment setup executable sets up the necessary execution environment
for example by adjusting \c{PATH} or running a suitable \c{vcvars} batch file.
It then re-executes itself as the \c{bbot} worker passing to it as command
line arguments (in addition to worker options) the list of build system
modules (\c{}) and the list of configuration options and variables
(\c{}). The environment setup executable must execute the
\c{bbot} worker in the build directory as the current working directory.

The re-executed \c{bbot} worker then proceeds to test the package from the
repository by executing the following commands, collectively called a
\i{worker script}. Each command has a unique \i{step id} that can be used as a
prefix in the \c{}, \c{}, and \c{}
values as discussed in \l{#arch-controller Controller Logic}. The
\c{<>}-values are from the task manifest and the environment:

\
# bpkg.configure.create
#
bpkg -V create   

# bpkg.configure.add
#
bpkg -v add 

# bpkg.configure.fetch
#
bpkg -v fetch --trust 

# bpkg.configure.build
#
bpkg -v build --yes --configure-only /

# bpkg.update.update
#
bpkg -v update 

# if the test operation is supported by the package:
#
# bpkg.test.test
#
bpkg -v test 

# for each package referred to by the tests, examples, or benchmarks
# package manifest values:
#
{
  # bpkg.configure.build
  #
  bpkg -v build --yes --configure-only \\
       ' []'

  # bpkg.update.update
  #
  bpkg -v update 

  # bpkg.test.test
  #
  bpkg -v test 
}

# if config.install.root is specified:
#
{
  # bpkg.install.install
  #
  bpkg -v install 

  # if the package contains subprojects that support the test
  # operation:
  #
  {
    # b.test-installed.create
    #
    b -V create   

    # b.test-installed.configure
    #
    b -v configure

    # b.test-installed.test
    #
    b -v test
  }

  # if any of the tests, examples, or benchmarks package manifest
  # values are specified:
  #
  {
    # bpkg.test-installed.create
    #
    bpkg -V create   

    # bpkg.configure.add
    #
    bpkg -v add 

    # bpkg.configure.fetch
    #
    bpkg -v fetch --trust 

    # for each package referred to by the tests, examples, or
    # benchmarks package manifest values:
    #
    {
      # bpkg.configure.build
      #
      bpkg -v build --yes --configure-only \\
           ' []'

      # bpkg.update.update
      #
      bpkg -v update 

      # bpkg.test.test
      #
      bpkg -v test 
    }
  }

  # bpkg.uninstall.uninstall
  #
  bpkg -v uninstall 
}
\

For details on configuring and testing installation refer to
\l{#arch-controller Controller Logic}.

As an example, the following POSIX shell script can be used to setup the
environment for building C and C++ packages with GCC 9 on most Linux
distributions.

\
#!/bin/sh

# Environment setup script for C/C++ compilation with GCC 9.
#
# $1  - target
# $2  - bbot executable
# $3+ - bbot options

set -e # Exit on errors.

mode=
case \"$1\" in
  x86_64-*)
    #mode=-m64
    ;;
  i?86-*)
    mode=-m32
    ;;
  *)
    echo \"unknown target: '$1'\" 1>&2
    exit 1
    ;;
esac
shift

exec \"$@\" cc config.c=\"gcc-9 $mode\" config.cxx=\"g++-9 $mode\"
\

\h#arch-controller|Controller Logic|

A \c{bbot} controller that issues own build tasks maps available build
machines (as reported by agents) to \i{build configurations} according to the
\c{buildtab} configuration file. Blank lines and lines that start with \c{#}
are ignored. All other lines in this file have the following format:

\
  [/]  []* []*

 = [:](|