Expand error handling section, other tweaks in Bash style guide

author: Boris Kolpackov <boris@codesynthesis.com> 2021-01-08 08:17:16 +0200
committer: Boris Kolpackov <boris@codesynthesis.com> 2021-01-08 09:12:19 +0200
commit: 6dd41689e4ff9cd70eb9b7073a6b28b427980ce3 (patch)
tree: 6a896cbda0bd35054fb589fb12cf233355f7035b /doc
parent: 6f57674600977e8e69edc1ad2268835dbe8364c5 (diff)
1 files changed, 223 insertions, 23 deletions
diff --git a/doc/bash-style.cli b/doc/bash-style.cli
index ef81af2..1b32b1a 100644
--- a/doc/bash-style.cli
+++ b/doc/bash-style.cli
@@ -52,9 +52,17 @@ for x in ...; do
 done
 \
 
-For \c{if} use \c{[ ]} for basic tests and \c{[[ ]]} only if the previous form
-is not sufficient. Use \c{test} for filesystem tests (presence of files,
-etc). Do use \c{elif}.
+Do use \c{elif} instead of nested \c{else} and \c{if}.
+
+For \c{if} use \c{[ ]} for basic tests and \c{[[ ]]} if the previous form is
+not sufficient or hairy. In particular, \c{[[ ]]} results in cleaner code
+for complex expressions, for example:
+
+\
+if [[ \"$foo\" && (\"$bar\" || \"$baz\") ]]; then
+  ...
+fi
+\
 
 \h1#struct|Structure|
 
@@ -73,7 +81,8 @@ usage=\"usage: $0 <OPTIONS>\"
 
 owd=\"$(pwd)\"
 trap \"{ cd '$owd'; exit 1; }\" ERR
-set -o errtrace # Trap in functions.
+set -o errtrace   # Trap in functions and subshells.
+shopt -s lastpipe # Execute last pipeline command in current shell.
 
 function info () { echo \"$*\" 1>&2; }
 function error () { info \"$*\"; exit 1; }
@@ -155,7 +164,7 @@ done
 \
 
 If the value you are expecting from the command line is a directory path,
-the always strip the trailing slash (as shown above for the \c{-t} option).
+then always strip the trailing slash (as shown above for the \c{-t} option).
 
 \h#struct-opt-arg-valid|OPTIONS-ARGUMENTS-VALIDATION|
 
@@ -201,19 +210,18 @@ list=\"$(basename \"$1\")\"
 \
 
 We also quote values that are \i{strings} as opposed to options/file names,
-paths, or integers. If setting a variable that will contain one of these
-unquoted values, try to give it a name that reflects its type (e.g.,
-\c{foo_file} rather than \c{foo_name}). Prefer single quotes for \c{sed}
+paths, enum-like values, or integers. Prefer single quotes for \c{sed}
 scripts, for example:
 
 \
-proto=\"https\"
-quiet=\"y\"
-verbosity=1
-dir=/etc
-out=/dev/null
-file=manifest
-seds='s%^./%%'
+url=\"https://example.org\"  # String.
+quiet=y                    # Enum-like.
+verbosity=1                # Integer.
+dir=/etc                   # Directory path.
+out=/dev/null              # File path.
+file=manifest              # File name.
+option=--quiet             # Option name.
+seds='s%^./%%'             # sed script.
 \
 
 Note that quoting will inhibit globbing so you may end up with expansions
@@ -279,11 +287,85 @@ echo \"files: ${files[@]}\"  # $1='files: one', $2='2 two', $3='three'
 echo \"files: ${files[*]}\"  # $1='files: one 2 two three'
 \
 
-\h1#trap|Trap|
+\h1#subshell|Subshell|
+
+Bush executes certain constructs in \i{subshells} and some of these constructs
+may not be obvious:
+
+\ul|
+
+\li|Explicit subshell: \c{(...)}|
+
+\li|Pipeline: \c{...|...}|
+
+\li|Command substitution: \c{$(...)}|
+
+\li|Process substitution: \c{<(...)}, \c{>(...)}|
+
+\li|Background: \c{...&}, \c{coproc ...}|
+
+|
+
+Naturally, a subshell cannot modify any state in the parent shell, which
+sometimes leads to counter-intuitive behavior, for example:
+
+\
+lines=()
+
+... | while read l; do
+  lines+=(\"$l\")
+done
+\
+
+At the end of the loop, \c{lines} will remain empty since the loop body is
+executed in a subshell. One way to resolve this is to use the program
+substitution instead of the pipeline:
+
+\
+lines=()
+
+while read l; do
+  lines+=(\"$l\")
+done < <(...)
+\
+
+This, however, results in an unnatural, backwards-looking (compared to the
+pipeline) code. Instead, we can request the last command of the pipeline to be
+executed in the parent shell with the \c{lastpipe} shell option, for example:
+
+\
+shopt -s lastpipe
+
+lines=()
+
+... | while read l; do
+  lines+=(\"$l\")
+done
+\
+
+\N|The \c{lastpipe} shell option is inherited by functions and subshells.|
+
+
+\h1#error-handing|Error Handling|
 
 Our scripts use the error trap to automatically terminate the script in case
-any command fails. If you need to check the exit status of a command, use
-\c{if}, for example:
+any command fails. This is also propagated to functions and subshells by
+specifying the \c{errtrace} shell option.
+
+\N|While the \c{pipefail} and \c{nounset} options may also seem like a good
+idea, they have subtle, often latent pitfalls that make them more trouble than
+they are worth (see \l{https://mywiki.wooledge.org/BashPitfalls#pipefail
+\c{pipefail} pitfalls}, \l{https://mywiki.wooledge.org/BashPitfalls#nounset
+\c{nounset} pitfalls}).
+
+In particular, without \c{pipefail}, non-zero exist of any command in the
+pipeline except the last is ignored. As a result, the pipeline needs to be
+designed to work correctly in such cases, normally by relying on the input (or
+lack thereof) to the last command to convey the failure. Alternatively, the
+exit status of the pipeline commands can be explicitly checked using the
+\c{PIPESTATUS} array.|
+
+If you need to check the exit status of a command, use \c{if}, for example:
 
 \
 if grep \"foo\" /tmp/bar; then
@@ -304,12 +386,130 @@ if v=\"$(...)\"; then
 fi
 \
 
-If you need to ignore the exit status, you can use \c{|| true}, for example:
+But keep in mind that in Bash a failure is often indistinguishable from a
+true/false result. For example, in the above \c{grep} command, the result will
+be the same whether there is no match or if the file does not exist.
+
+Furthermore, in certain contexts, the above-mentioned error trap is ignored.
+Quoting from the Bash manual:
+
+\i{The \c{ERR} trap is not executed if the failed command is part of the
+command list immediately following an \c{until} or \c{while} keyword, part of
+the test following the \c{if} or \c{elif} reserved words, part of a command
+executed in a \c{&&} or \c{||} list except the command following the final
+\c{&&} or \c{||}, any command in a pipeline but the last, or if the command’s
+return status is being inverted using \c{!}. These are the same conditions
+obeyed by the \c{errexit} (\c{-e}) option.}
+
+To illustrate the gravity of this point, consider the following example:
 
 \
-foo || true
+function cleanup()
+{
+  cd \"$1\"
+  rm -f *
+}
+
+if ! cleanup /no/such/dir; then
+  ...
+fi
 \
 
+Here, the \c{cleanup()} function will continue executing (and may succeed)
+even if the \c{cd} command has failed.
+
+Note, however, that notwithstanding the above statement from the Bash manual,
+the trap is executed in all the commands of a pipeline provided the
+\c{errtrace} option is specified (presumably because commands of a pipeline
+are said to execute in subshells). As a result, the above code can be made to
+work using the pipe trick:
+
+\
+cleanup /no/such/dir | cat
+
+if [ \"${PIPESTATUS[0]}\" -ne 0 ]; then
+  ...
+fi
+\
+
+\N|If the \c{pipefail} shell option is set, then the explicit \c{PIPESTATUS}
+check is not necessary since the function failure will trigger the error trap
+in the current shell.|
+
+The recommendation is then to avoid calling functions in contexts where the
+error trap is ignored resorting to the pipe trick where that's not possible.
+And to be mindful of the potential ambiguity between the true/false result and
+failure for other commands. The use of the \c{&&} and \c{||} command
+expressions is best left to the interactive shell.
+
+\N|The pipe trick cannot be used if the function needs to modify the global
+state. Such a function, however, can return the exit status also as part of
+the global state. The pipe trick can also be used to to ignore the exit status
+of a command (provided \c{pipefail} is not set).|
+
+The pipe trick can also be used to distinguish between different exit codes,
+for example:
+
+\
+function foo()
+{
+  bar  # If this command fails, the function returns 1.
+
+  if ... ; then
+    return 2
+  fi
+}
+
+foo | cat
+
+case \"${PIPESTATUS[0]}\" in
+  0)
+    ;;
+  1)
+    exit 1
+    ;;
+  2)
+    ...
+    ;;
+esac
+\
+
+\N|In such functions it makes sense to keep exit code 1 to mean failure so
+that the inherited error trap can be re-used.|
+
+This technique can be further extended to implement functions that both
+return multiple exit codes and produce output, for example:
+
+\
+function foo()
+{
+  bar  # If this command fails, the function returns 1.
+
+  if ... ; then
+    return 2
+  fi
+
+  echo result
+}
+
+foo | readarray -t r
+
+case \"${PIPESTATUS[0]}\" in
+  0)
+    echo \"${r[0]}\"
+    ;;
+  1)
+    exit 1
+    ;;
+  2)
+    ...
+    ;;
+esac
+\
+
+\N|We use \c{readarray} instead of \c{read} since the latter fails if the left
+hand side of the pipeline does not produce anything.|
+
 
 \h1#bool|Boolean|
 
@@ -347,8 +547,8 @@ For non-trivial/obvious functions also provide a short description of its
 functionality/purpose, for example:
 
 \
-# Prepare a distribution of the specified packages and place it into the
-# specified directory.
+# Prepare a distribution of the specified packages and place it
+# into the specified directory.
 #
 function dist() # <pkg> <dir>
 {
@@ -367,7 +567,7 @@ function dist()
 
 If the evaluation of the value may fail (e.g., it contains a program
 substitution), then place the assignment on a separate line since \c{local}
-will cause the error to be ignore. For example:
+will cause the error to be ignored. For example:
 
 \
 function dist()
author	Boris Kolpackov <boris@codesynthesis.com>	2021-01-08 08:17:16 +0200
committer	Boris Kolpackov <boris@codesynthesis.com>	2021-01-08 09:12:19 +0200
commit	6dd41689e4ff9cd70eb9b7073a6b28b427980ce3 (patch)
tree	6a896cbda0bd35054fb589fb12cf233355f7035b /doc
parent	6f57674600977e8e69edc1ad2268835dbe8364c5 (diff)