Update Bash guide with new pipefail recommendations

author: Boris Kolpackov <boris@codesynthesis.com> 2021-01-14 09:30:09 +0200
committer: Boris Kolpackov <boris@codesynthesis.com> 2021-01-14 09:30:09 +0200
commit: 33582e0b96985202cc1bf0c380247747af529c4a (patch)
tree: b88971c9b41988fb0ed4ec40d37d864324c214a4 /doc/bash-style.cli
parent: 3651965a5be8847ac54abf0b385dcbe479a7a972 (diff)
1 files changed, 89 insertions, 72 deletions
diff --git a/doc/bash-style.cli b/doc/bash-style.cli
index a76053a..c3e79d4 100644
--- a/doc/bash-style.cli
+++ b/doc/bash-style.cli
@@ -15,7 +15,7 @@
 \h1#intro|Introduction|
 
 Bash works best for simple tasks. Needing arrays, arithmetic, and so on, is
-usually a good indication that the task at hand is too complex for Bash.
+usually a good indication that the task at hand may be too complex for Bash.
 
 Most of the below rules can be broken if there is a good reason for it.
 Besides making things consistent, rules free you from having to stop and think
@@ -31,8 +31,11 @@ the former provides a lot more rationale compared to this guide.
 \h1#style|Style|
 
 Don't use any extensions for your scripts. That is, call it just \c{foo}
-rather than \c{foo.sh} or \c{foo.bash}. Use lower-case letters and dash
-to separate words, for example \c{foo-bar}.
+rather than \c{foo.sh} or \c{foo.bash} (though we do use the \c{.bash}
+extension for
+\l{https://build2.org/build2/doc/build2-build-system-manual.xhtml#module-bash
+Bash modules}). Use lower-case letters and dash to separate words, for example
+\c{foo-bar}.
 
 Indentation is two spaces (not tabs). Maximum line length is 79 characters
 (excluding newline). Use blank lines between logical blocks to improve
@@ -46,13 +49,16 @@ is written on the same line after a semicolon, for example:
 
 \
 if [ ... ]; then
+  ...
 fi
 
 for x in ...; do
+  ...
 done
 \
 
-Do use \c{elif} instead of nested \c{else} and \c{if}.
+Do use \c{elif} instead of nested \c{else} and \c{if} (and consider is
+\c{case} can be used instead).
 
 For \c{if} use \c{[ ]} for basic tests and \c{[[ ]]} if the previous form is
 not sufficient or hairy. In particular, \c{[[ ]]} results in cleaner code
@@ -82,7 +88,8 @@ usage=\"usage: $0 <OPTIONS>\"
 owd=\"$(pwd)\"
 trap \"{ cd '$owd'; exit 1; }\" ERR
 set -o errtrace   # Trap in functions and subshells.
-shopt -s lastpipe # Execute last pipeline command in current shell.
+set -o pipefail   # Fail if any pipeline command fails.
+shopt -s lastpipe # Execute last pipeline command in the current shell.
 
 function info () { echo \"$*\" 1>&2; }
 function error () { info \"$*\"; exit 1; }
@@ -416,8 +423,8 @@ function dist()
 A function can return data in two primary ways: exit code and stdout.
 Normally, exit code 0 means success and exit code 1 means failure though
 additional codes can be used to distinguish between different kinds of
-failures, signify special conditions, etc., see \l{#error-handing Error
-Handling} for details.
+failures (for example, \"hard\" and \"soft\" failures), signify special
+conditions, etc., see \l{#error-handing Error Handling} for details.
 
 A function can also write to stdout with the result available to the caller in
 the same way as from programs (command substitution, pipeline, etc). If a
@@ -426,28 +433,14 @@ with newlines with the caller using the \c{readarray} builtin to read them
 into an indexed array, for example:
 
 \
-function foo ()
+function func ()
 {
   echo one
   echo two
   echo three
 }
 
-foo | readarray -t r
-\
-
-In this case, if the function can fail, then the failure should be explicitly
-checked for (either by examining \c{PIPESTATUS} or via the lack of the
-result), since the \c{ERR} trap will not be triggered (unless the \c{pipefail}
-shell option is set; see \l{#error-handing Error Handling} for details). For
-example:
-
-\
-foo | readarray -t r
-
-if [ \"${PIPESTATUS[0]}\" -ne 0 ]; then
-  exit 1
-fi
+func | readarray -t r
 \
 
 \N|The use of the newline as a separator means that values may not contain
@@ -455,23 +448,19 @@ newlines. While \c{readarray} supports specifying a custom separator with the
 \c{-d} option, including a \c{NUL} separator, this support is only available
 since Bash 4.4.|
 
-This technique can also be extended to return an associative array by
+This technique can also be extended to return an associative array by first
 returning the values as an indexed array and then converting them to
 an associative array with \c{eval}, for example:
 
 \
-function foo ()
+function func ()
 {
   echo \"[a]=one\"
   echo \"[b]=two\"
   echo \"[c]=three\"
 }
 
-foo | readarray -t ia
-
-if [ \"${PIPESTATUS[0]}\" -ne 0 ]; then
-  exit 1
-fi
+func | readarray -t ia
 
 eval declare -A aa=(\"${ia[@]}\")
 \
@@ -480,7 +469,7 @@ Note that if a key or a value contains whitespaces, then it must be quoted.
 The recommendation is to always quote both, for example:
 
 \
-function foo ()
+function func ()
 {
   echo \"['a']='one ONE'\"
   echo \"['b']='two'\"
@@ -491,7 +480,7 @@ function foo ()
 Or, if returning a local array:
 
 \
-function foo ()
+function func ()
 {
   declare -A a=([a]='one ONE' [b]=two [c]=three)
 
@@ -508,30 +497,52 @@ For more information on returning data from functions, see
 \h1#error-handing|Error Handling|
 
 Our scripts use the \c{ERR} trap to automatically terminate the script in case
-any command fails. This is also propagated to functions and subshells by
-specifying the \c{errtrace} shell option.
-
-\N|While the \c{pipefail} and \c{nounset} options may also seem like a good
-idea, they have subtle, often latent pitfalls that make them more trouble than
-they are worth (see \l{https://mywiki.wooledge.org/BashPitfalls#pipefail
-\c{pipefail} pitfalls}, \l{https://mywiki.wooledge.org/BashPitfalls#nounset
-\c{nounset} pitfalls}).
-
-In particular, without \c{pipefail}, a non-zero exit of any command in the
-pipeline except the last is ignored. As a result, the pipeline needs to be
-designed to work correctly in such cases, normally by relying on the input (or
-lack thereof) to the last command to convey the failure. Alternatively, the
-exit status of the pipeline commands can be explicitly checked using the
-\c{PIPESTATUS} array.|
+any command fail. This semantics is also propagated to functions and subshells
+by specifying the \c{errtrace} shell option and to all the commands of a
+pipeline by specifying the \c{pipefail} option.
+
+\N|Without \c{pipefail}, a non-zero exit of any command in the pipeline except
+the last is ignored. The \c{pipefail} shell option is inherited by functions
+and subshells.|
+
+\N|While the \c{nounset} options may also seem like a good idea, it has
+subtle, often latent pitfalls that make it more trouble than it's worth (see
+\l{https://mywiki.wooledge.org/BashPitfalls#nounset \c{nounset} pitfalls}).|
+
+The \c{pipefail} semantics is not without pitfalls which should be kept in
+mind. In particular, if a command in a pipeline exits before reading the
+preceding command's output in its entirety, such a command may exit with a
+non-zero exit status (see \l{https://mywiki.wooledge.org/BashPitfalls#pipefail
+\c{pipefail} pitfalls} for details).
+
+\N|Note that in such a situation the preceding command may exit with zero
+status not only because it gracefully handled \c{SIGPIPE} but also because all
+of its output happened to fit into the pipe buffer.|
+
+For example, these are the two common pipelines that may exhibit this issue:
+
+\
+prog | head -n 1
+prog | grep -q foo
+\
+
+In these two cases, the simplest (though not the most efficient) way to work
+around this issue is to reimplement \c{head} with \c{sed} and to get rid of
+\c{-q} in \c{grep}, for example:
+
+\
+prog | sed -n -e '1p'
+prog | grep foo >/dev/null
+\
 
 If you need to check the exit status of a command, use \c{if}, for example:
 
 \
-if grep \"foo\" /tmp/bar; then
+if grep -q \"foo\" /tmp/bar; then
   info \"found\"
 fi
 
-if ! grep \"foo\" /tmp/bar; then
+if ! grep -q \"foo\" /tmp/bar; then
   info \"not found\"
 fi
 \
@@ -579,41 +590,41 @@ even if the \c{cd} command has failed.
 
 Note, however, that notwithstanding the above statement from the Bash manual,
 the \c{ERR} trap is executed inside all the subshell commands of a pipeline
-provided the \c{errtrace} option is specified.  As a result, the above code
-can be made to work using the pipe trick:
+provided the \c{errtrace} option is specified. As a result, the above code can
+be made to work by temporarily disabling \c{pipefail} and reimplementing it as
+a pipeline:
 
 \
+set +o pipefail
 cleanup /no/such/dir | cat
+r=\"${PIPESTATUS[0]}\"
+set -o pipefail
 
-if [ \"${PIPESTATUS[0]}\" -ne 0 ]; then
+if [ \"$r\" -ne 0 ]; then
   ...
 fi
 \
 
-\N|If \c{cleanup}'s \c{cd} fails, the \c{ERR} trap will be executed in the
-+subshell, causing it to exit with an error status which the parent shell then
-+makes available in \c{PIPESTATUS}.
-
-If the \c{pipefail} shell option is set, then the explicit \c{PIPESTATUS}
-check is not necessary since the function failure will trigger the \c{ERR}
-trap in the current shell.|
+\N|Here, if \c{cleanup}'s \c{cd} fails, the \c{ERR} trap will be executed in
+the subshell, causing it to exit with an error status, which the parent shell
+then makes available in \c{PIPESTATUS}.|
 
 The recommendation is then to avoid calling functions in contexts where the
-\c{ERR} trap is ignored resorting to the pipe trick where that's not possible.
-And to be mindful of the potential ambiguity between the true/false result and
-failure for other commands. The use of the \c{&&} and \c{||} command
-expressions is best left to the interactive shell.
+\c{ERR} trap is ignored resorting to the above pipe trick where that's not
+possible.  And to be mindful of the potential ambiguity between the true/false
+result and failure for other commands. The use of the \c{&&} and \c{||}
+command expressions is best left to the interactive shell.
 
 \N|The pipe trick cannot be used if the function needs to modify the global
-state. Such a function, however, can return the exit status also as part of
-the global state. The pipe trick can also be used to ignore the exit status
-of a command (provided \c{pipefail} is not set).|
+state. Such a function, however, might as well return the exit status also as
+part of the global state. The pipe trick can also be used to ignore the exit
+status of a command.|
 
 The pipe trick can also be used to distinguish between different exit codes,
 for example:
 
 \
-function foo()
+function func()
 {
   bar  # If this command fails, the function returns 1.
 
@@ -622,9 +633,12 @@ function foo()
   fi
 }
 
-foo | cat
+set +o pipefail
+func | cat
+r=\"${PIPESTATUS[0]}\"
+set -o pipefail
 
-case \"${PIPESTATUS[0]}\" in
+case \"$r\" in
   0)
     ;;
   1)
@@ -643,7 +657,7 @@ This technique can be further extended to implement functions that both
 return multiple exit codes and produce output, for example:
 
 \
-function foo()
+function func()
 {
   bar  # If this command fails, the function returns 1.
 
@@ -654,9 +668,12 @@ function foo()
   echo result
 }
 
-foo | readarray -t r
+set +o pipefail
+func | readarray -t r
+r=\"${PIPESTATUS[0]}\"
+set -o pipefail
 
-case \"${PIPESTATUS[0]}\" in
+case \"$r\" in
   0)
     echo \"${r[0]}\"
     ;;
author	Boris Kolpackov <boris@codesynthesis.com>	2021-01-14 09:30:09 +0200
committer	Boris Kolpackov <boris@codesynthesis.com>	2021-01-14 09:30:09 +0200
commit	33582e0b96985202cc1bf0c380247747af529c4a (patch)
tree	b88971c9b41988fb0ed4ec40d37d864324c214a4 /doc/bash-style.cli
parent	3651965a5be8847ac54abf0b385dcbe479a7a972 (diff)