I consider shellcheck absolutely essential if you're writing even a single line of Bash. I also start all my scripts with this "unofficial bash strict mode" and DIR= shortcut:
#!/usr/bin/env bash
### Bash Environment Setup
# http://redsymbol.net/articles/unofficial-bash-strict-mode/
# https://www.gnu.org/software/bash/manual/html_node/The-Set-Builtin.html
# set -o xtrace
set -o errexit
set -o errtrace
set -o nounset
set -o pipefail
IFS=$'\n'
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
I've had a thought percolating (for many months?) but I haven't tried to phrase it and I'm leery it may sound condescending (this is also more of a public address than direct response) ...
Shellcheck tends to steer us away from "weird" parts of Bash/shell and will happily suggest changing code that was correct as written. There are good reasons for this (and I still think most scripts/projects should use Shellcheck), but Bash is a weird language, and there's a lot of power/potential in some of the weird stuff Shellcheck walls off.
If I had to nail down some heuristics for working with it...:
- do lean heavily on Shellcheck for writing new code if you're unfamiliar with the language or are only writing it under duress
- don't implement its suggestions in bulk (the time you save on iterating can easily be blown later trying to debug subtle behavior changes)
- don't apply them to code you don't understand (if you have the time and interest to understand it but find yourself stuck on some un-searchable syntax, explainshell may help)
- don't adopt them without careful testing
- do take Shellcheck warnings about code that is correct-as-written as a hint to leave a comment explaining what's going on
- do leave Shellcheck off (or only use it intermittently) if you're trying to explore, play, or otherwise learn the language
For me this is way too much to invest in Bash. Any non-trivial utilities should be written in a real programming language when possible, especially when they are used in a context where they are likely to accrete complexity over time. I’m sure there are good use cases for complex Bash programs, but we should be reaching for something with less footguns by default.
(Shellcheck is an amazing utility and I use it for all Bash that I write.)
What is? Did I suggest anyone write complex Bash programs?
Shellcheck is amazing. As I suggested in the previous post, I use it for most of my own shell scripts/projects, especially stuff I intend to release. But, because shell and Bash are weird and full of pitfalls, I think it's worth making sure people know you can't just Shellcheck-and-ship.
I think his nitpick was you promoting weird bash tricks. They're neat to use but usually a pain for someone else to maintain, as a general rule, don't use them, don't try to be clever unless you're 100% sure that it's code you'll be the only one to see (which can rarely be guaranteed).
> and there's a lot of power/potential in some of the weird stuff Shellcheck walls off.
Can you give an example? Something Shellcheck warns you away from but which, if you used it, would be better (more expressive / more powerful / more whatever) than a shellcheck-approved solution?
This gets the information without another external utility (grep, cut, sed, awk...)
It's well-past bedtime, so for now I'm just leaving the first one I can think of (it's top-of-mind since I Shellchecked it in the last couple weeks... :])
It's a little contrived in this case, but another common example is a related suggestion for `read -r`:
get_nix_version_parts(){
local major minor patch
# shellcheck disable=SC2034,SC2162
IFS="." read major minor patch < <(get_nix_version)
local -p
}
$ get_nix_version_parts
major=2
minor=3
patch=4
Using -r there works fine for me, though your SC2034 complaint is valid, in that it essentially prevents from you from using a very specific built in formatter provided by "local -p".
This feels like a rare case, and I'd have no problem doing the "disable=SC2034" there, but I'd add a note explaining why.
Even then, one could make an argument (I think I would, actually) that an explicit printf is clearer and more robust. For example, it gives you more control over the formatting, and it doesn't introduce implicit coupling between the output of "local -p" and our function, which in theory could change in the future or be different on different bash versions, etc. I suspect this probability is very low in the particular case of local -p, but still it's not a great practice. Not something I want to give the stamp of approval of inside my code base.
get_nix_version_parts(){
local major minor patch
IFS="." read -r major minor patch < <(nixv)
printf '%s=%s\n' major "$major" minor "$minor" patch "$patch"
}
Good catch on -r not mattering, here. I think `read` is one I've had trouble with and assumed that was it since I bothered disabling it. But after grepping around a bit more, the only one I see that shellcheck would object to that actually differs isn't common enough that I'd gripe about it.
I don't actually have a gripe with 2034, here. It's a good example of a statement that the user obviously has to triage.
Likewise, I'm just using `local -p` as a cheap way to show that the variables populate, it wasn't part of the original code I adapted. Yes--the case for coupling to the output of local -p is when the output is going to be used to set variables again later.
Going back to intentionally look through a few things for examples is helping me better clarify why I've had this impression slowly building up...
1. Some Shellcheck code titles/short messages communicate uncertainty well, but some others can read as more confident/absolute than the situation merits.
2. Broadly, Shellcheck tends to steer people away from word-splitting. Word-splitting can be a surprising PITA, so I get it. But, it's also a critical concept for understanding shell.
If you are reluctantly writing Bash/shell, it's good to have Shellcheck help you avoid it. If you want or need to understand the language, you're going to have to grapple with word-splitting to get there.
Fair :) but I think there's a degree difference here (and this is more about the language than Shellcheck itself).
Also, maybe my original phrasing left room for misinterpretation? Shellcheck will happily suggest changes that break working code. I think I've had it happen ~4 times this year?
Obviously it varies a little by tool type and ecosystem--it's obviously not much of a surprise if a dead-code analysis tool suggests removing things you can't actually cut, and likewise linters in most languages can point out unused variables/parameters that you can't cut. But I can't recall the last time another tool gave me specific do-x-not-y suggestions that just weren't anywhere close to fungible.
> Shellcheck will happily suggest changes that break working code.
Which is OK once it is known. Your example nicely shows that. The flagged behavior is often unexpected, so there where it is expected having shellcheck line that turns off the warning greatly improves readability of the whole script, essentially showing the reader "here is the dependence on non obvious behavior."
Where such dependence exists, I prefer to see the "shellcheck" line before pointing to it (by turning that specific warning off exactly in that line). It's then a "vetted" and "explained" line.
I have created a similar library for bash. However, there are more pitfalls related to errexit, see [1] and even shellcheck can not help there. I have tried to solve the pitfalls in my library, but it turned out to be ugly and unreliable. That's why I'm trying to use rust [2] for shell scripting like tasks nowadays.
You've a lot of cool stuff in that repo! I hope you don't mind me 'stealing' some stuff for a kind of booklet I'm working on with several shell tips and tricks.
imho all the behaviors that post outlines as "confusing" are my desired behaviors most of the time. For me it's pretty clear-cut, error out by default and only continue when explicitly allowed with something like `|| true`.
unless
the command that fails is part of an until or while loop, part of an
if statement, part of a && or || list, or if the command's return status
is being inverted using !.
Bash is one of my favorite programming languages. Not for production. Writing significant production Bash that others have to use and maintain is more of a struggle than it is worth.
But for personal use? Hitting APIs, organizing data, writing to files... Bash has been enjoyable because it requires much more creativity than almost any other language. At some point it's just you and the coreutils against the world.
These type of things is why I always use a real programming language rather than shell scripting. Can't wrap my head around all the different pitfalls of treating random string outputs as input for other commands without first validating and parsing them into an actual data structure.
Shell scripting has it’s place. In general it works fairly well and it’s an important skill to have.
I used to have a report who had a similar train of thought, and ended up writing shell scripts in python which was way more fragile, unreadable and more complicated.
The edge cases are there, but you don’t run into them all that often. Or if you do, you fix your scripts. Take the cp —- 2nd example. You generally don’t start file names with a -. It just makes Unix life annoying so you get pushed away from doing it.
Keep scripts sane and straight forward, you can do really nice and powerful work knowing Unix tools. You can also shoot yourself in the foot some day when you create a file called ‘-rf’.
Bash is like a spec ops soldier's knife. Sure it could be used to kill someone (Bash on Balls, anyone?), but it's mostly for doing boring stuff like opening MREs (getting around a filesystem) or removing a deep thorn from the skin (removing a stupid "$n/a" string in a one-use TSV).
I wouldn't send a navy seal into battle without one, but it isn't an absolutely critical piece of gear.
Many people have responded to your comment saying that this is bad and that using other languages makes things more complicated when you are doing 'bash-like' things. I'm guessing they haven't actually tried it, because I have used python with the [sh](https://github.com/amoffat/sh) package many times to replace bash scripts, and I have never regretted it.
I definitely relate with this. This is why I do my “scripting” these days in Go. Programs aren’t editable like Python, but compiling a Go program results in a single executable that I can just drop onto a host somewhere, without worrying about interpreter versions and pip and all that crap.
I haven’t found Bash to be much better in practice. You can’t really do much with Bash alone, it’s only as useful as the tools you use it with, the ones you install with your OS package manager. And Bash versions themselves can vary dramatically. Try writing non-trivial Bash scripts that are portable between macOS and Linux...it’s a nightmare.
(Yes I know there are libs for python that can do this, but in general Go is just simpler: it has everything I need for writing utilities out of the box, including way simpler package management, and I can come back 2 years later to some code I wrote and jump back into it with no effort.)
>Bash alone, it’s only as useful as the tools you use it with
This is like half the point for me. It's that I have lots of cli tools I like to use, and many of them are very unix-philosophy like. I want to cobble them together to accomplish a task, and since they already work with shell and pipes etc, it's trivial.
Now in an alternative, I have to go find some library that does the equivalent, or write my own version of the functions, etc, and lots of times those libraries have the thier own sets of pitfalls. I like bash because it encourages sticking the unix philsophy of do one thing and do it well (because you are just using it as a glue, not as a material itself). (also sometimes I just want something thats easy to read and understand)
For all those who bash bash as so horrible, unstable, non-prod.... bash scripts aren't perfect, and nobody who uses bash would say so, but bash runs on millions of prod systems across the world and things are fine... so maybe whats needed is a bit less holier-than-thou attitude from those people and bit more "you do you".
The point is that you still have to make sure those tools are installed on the system, and compatible with the code you wrote. Standard CLI programs we rely on in our Bash scripts can vary dramatically between Linux distributions and OSs. If you want portable Bash code you need to bundle it in a container w/ all of its dependencies, or be very judicious about what programs and program features you use.
I'm not bashing Bash, I use Bash all the time and it's enormously useful. But it has limitations and tradeoffs that make me reach for other things more often.
One of the beauties of bash is that it's _not_ compiled.
Can you read the Go source code from a Go binary? Is the algorithm transparent to the user?
How do you jump back to work on a binary? You have to keep the source code somewhere else (probably in a repo somewhere); now you have an application instead of a script.
> You have to keep the source code somewhere else (probably in a repo somewhere)
You should be doing that for scripts as well! Far too often I've seen problems arise when a team depends on a process but its just a shell script run by cron from the home directory of the guy who's on vacation.
You could also do `go run main.go` if you want to be close to the source. The Gitlab Runner repo has a "script" run like that[1].
Yeah that's one of the first things I mentioned. Go executables aren't editable. In practice it barely makes a difference, if you can SSH onto a machine then you can SCP a binary onto it too. At that point you edit the source code and press Ctrl+P, Return to rerun "go build -o fix-the-thing main.go && scp fix-the-thing $HOST:/tmp && ssh /tmp/fix-the-thing". And then you've got your code ready to check in to source.
Yup and something rarely mentioned is lack of good testing frameworks and all the pitfalls when you try to make something portable
I will admit Python doesn't have great, simple primitives for shell-like tasks but you can create a few functions to make a simple sh("") wrapper.
With Ruby's backtick execution syntax you can blend in shell/system utilities pretty easily. Unfortunately, Ruby isn't installed on as many systems by default, afaik
Here's a Ruby equivalent of looping through mp3 files in the current directory
I find a lot of times someone extends or duplicates a bash script and soon gets over their head in complexity, or doesn’t realize how much easier a Ruby/Python/whatever script would be to change, mostly because it’s so much more readable.
Bash has all kinds of pitfalls and things like checking if a file exists vs if it’s larger than 0 bytes.
Then you get people who instead use php to do scripts. Ugh.
This is where perl really shines for me. Regular expressions are front and center, making it so frictionless to slice and dice text. My bash knowledge is so stunted because I bail to perl as soon as the going gets tough. :)
Mine remains: that shell affords capabilities via utilities. Awk (again, with numerous implementations) also offers regex capabilities, some not offerred by sed, and missing a few.
When writing cross-platform scripting, adhering to common standards rises in importance. This is possible, if occasionally limiting.
Perl is influenced by shell scripting and there are several easy ways to run commands, build pipes, redirect and other things shells are good at, like testing and listing files.
And as ugly as Perl looks, it has much less pitfalls than shell scripts.
With proper development practices, you can write clean Perl, even if you don't know much about the language. It has the same constructs as most other procedural languages and you can apply the same principles. Object orientation is a bit tricky though.
The thing is, Perl won't help you with discipline. If you want write-only code, Perl will compile it, no problem.
For that reason, it is the language of choice for one liners and throwaway scripts (and that's how I use it most or the times). But writing clean Perl is perfectly doable, though I usually prefer static languages if maintainable code is a priority (no Perl, no shell, not even Python).
While bash itself has some warts, many of the issues are inherent to spawning child processes for accomplishing tasks. Thus, when possible, I try to use proper libraries rather than invoking commands from any language at all.
The cost you pay for this is verbose and unergonomic interop when your problem domain is executing a bunch of commands. For all its warts shells shine at being extremely expressive.
It’s usually not worth the effort to shove everything into a real programming language but write a few self-contained commands designed to be called from a shell and then use the shell as the glue.
Years ago I spent a lot of time (arguably too much time) writing (ba)sh scripts and this document was a great resource! But my main lesson from that period was that bash pitfalls are readability/maintenance nightmares, confounded by the fact that most people don't know about most of the pitfalls. To make matters worse, you can't know which pitfalls are or aren't known to the original author / future maintainer of the script in front of you.
But I'm not sure to what extent the language warts are inevitable consequences of what makes it so expressive and powerful. For example: unix pipes are amazing and the bash syntax for them is quite expressive. I can reproduce the same behavior in any of the more "sane" languages but I find that the equivalent code is a less readable procedural soup of string munging, control flow, and error control.
Is there a greater principle behind all of these pitfalls, a mental model that I can learn and use to avoid these kinds of mistakes? Or is it sort of like CSS where years and years of ad-hoc development make it an exercise in memorizing edge-cases?
* On Linux file and directory names can be anything except NUL. Because of how much software already assumes they’re UTF-8 you can probably safely assume it too and not run into much trouble. But since bash basically just does some text substitutions and then calls eval you have to be defensive.
Pretty much this, and as a subset of it: quoting, word splitting, the IFS variable and friends. Reading the BashGuide [1] cover to cover is an excellent base, I find.
Do you know if the `--` trick mentioned in the article to stop option scanning works with all bash programs? I thought parsing was up to the program itself. Is it just a widely used convention?
Talking of mental models: These are not "bash programs". Most of the ones where you need this mechanism have nothing to do with the Bourne Again shell, and are just generally-usable utility programs, usable even by things which aren't shells at all.
They are, generally, programs that happen to use library mechanisms such as (but not limited to) getopt() or the popt library to parse their arguments.
The most important thing for understanding shell (without loads of special cases) is understanding the Unix kernel, as well as the kernel interface that the shell uses (i.e. which syscalls).
If you strace the shell, that goes a long way. If you haven't used strace or don't know what it is, I recommend learning.
These comics give more details, and so do Julia Evan's other comics (I think the first one was about strace):
For example this will help you understand that filenames and argv arrays are byte strings in Unix, which may be different than the notion of strings in your programming language.
-----
That will go a long way. However there is still a ton of accidental and needless complexity in bash, particularly related to string processing, and which is unrelated to the kernel. Hence the Oil project:
I think it is different than CSS here. With CSS I could never really understand how it worked and each new browser released introduced new concepts that added more and more corner cases to the mix. On the other hand, with shell scripting after a while I got a decent mental model of how things worked. It is just that there are some bad design choices that make it very easy to shoot yourself on the foot.
For example, word splitting "feature" of POSIX shell is the source of many of the pitfalls in the list and the solution in those cases is always the same -- always quote your variables.
By the way, check out Shellcheck if you haven't already. It can detect many of these common pitfalls and the error messages have a link to the explanation so it also works as a learning tool.
Seeing this wiki hugged to death by HN makes me realize that it would be perfect for migrating to some simple cloud hosting and using static markdown in git.
All their maintainers are already technical enough to handle doing updates in git. And with certain static doc generators like mkdocs you can have an Edit link in each page that brings the user to a Gitlab repo with an editor and Preview.
Throwing everything at another company so you can blame your outages on them does not a better engineer make.
This site is just doing something silly dynamically for something that should be static (e.g. reading from a database). A single modern computer can easily handle the HN hug of death serving static files.
Shellcheck is, in general, one of the most undervalued dev tools hiding in the bushes. Especially as everyone has to shell at least a little bit, in particular linux newbies.
With sed, in some cases, you can. One of the differences between Linux-provided sed and BSD-provided sed is the -a option.
FreeBSD manpage:
"-a The files listed as parameters for the "w" functions are created (or truncated) before any processing begins, by default. The -a option causes sed to delay opening each file until a command containing the related "w" function is applied to a line of input."
So let's say you have a file
cat > file
sandfoo
^D
BSD-provided sed
cat file|sed -na 's/foo/bar/;H;$!d;g;w'file
The replacement is made without any temp file.
Note this example will add a blank line at the top.
Why use sed? sed is part of BSD base system, the toolchain^1 and install images thus it can easily be utilised early in boot process. I am not aware that "sponge" is part of any base system.
1. NetBSD's toolchain can be reliably cross-compiled on Linux.
sed -i creates a temp file. Needs additional storage space the size of the file.
This can pose a problem with large files on smaller computers with limited storage and memory.
In the past, some BSD seds (other than FreeBSD) did not have the -i option.
Also worth noting that "sponge", unlike sed, apparently reads the entire file into memory. For a large file, this would require enough RAM to fit the entire file.
Most of the pitfalls described in this doc are /bin/sh pitfalls, not bash per-se.
On that note, let me rant: stop writing bash scripts people. If POSIX shell is not good enough, it's a good sign that you should move to a more expressive language with better error handling. Bash is a crappy compromise: it's not as portable as POSIX but it's also a very crappy scripting language.
You might say I'm a pedant because "bash is everywhere nowadays anyway, your /bin/sh might even be bash in disguise" but I'll be the one having the last laugh the day you have to script something in some busybox or bash-less BSD environment. Knowing how to write portable shell scripts is a skill I value a lot personally, but that might be because I work a lot with embedded systems.
Also if you assume that your target system will always have bash, there's a very good chance that it'll also have perl and/or python too. So no excuses.
This is basic nonsense. Design your scripts and provision your environments. _You_ can stop writing bash scripts. I find them eminently useful and I control/provision the environments they run in. If you are not in control then do what you are told. Most people would puke if I told them that a lot of csh is alive and well in sci-prod environments but that is the way it are. You _do_ tailor your environment bc that is the way *nix was designed. Controlling people and peoples habits is faang land.
The biggest transformative learning experience for me in Unix command gluing was to understand argument vectors. That a command execution is not just one big string, but will at the point of calling the command, be transformed by Bash to a vector of strings, to be passed on to the execve system call.
The point is, all those quotes etc. are syntax we use to tell the Bash I terpreter what the strings passed on should be. Think in terms of the string vector and not in terms of the one big string in front of your eyes that is a line of Bash code. The latter is a recipe for memorizing magic incantations and cargo cult sprinkling of quotes here and there.
The horrible thing with Bash is that while normal programming languages make the difference crystal clear between the name of a function being called and its mandatorily quoted string args neatly separated with commas, Bash will come up with this split on the spot using arcane rules.
Letting a string automatically fall apart into multiple argemts on the spot may have seemed convenient in the old days for forwarding invocations, but it seems clear to me that this was a design mistake.
In the end if you want to have robust Bash code you have to keep track of string splitting and will almost never rely on the default behavior of letting Bash "helpfully" pre-digest and split things up for you.
Much of Bash expertise is then about how to fight vigorously against the default behavior of silently chunking up your args. But fighting really strong means using e. g. arrays with ugly syntax, and other magic looking constructs. It shouldn't take so much alertness to just keep to sane behavior. The problem is, the interpreter is actively hostile to your endeavor and will explicitly try to screw with you.
I was writing a script recently and I was going to use `cp -- "$src" "$dst"` but neither `man cp` nor `cp --help` mentioned `--` as an option. I'm not sure I'm using info right, but it doesn't seem to be mentioned in `info cp`, either.
Is that feature described somewhere else, or is it just undocumented?
The -- mechanism in general, not specific to the cp command albeit mostly limited to programs that use getopt/popt for their argument parsing, is described all over the place. I first encountered it in a book, written by Eric Foxley, entitled UNIX for Super-Users and published in 1985.
* For the cp command that is part of Rob Landley's toybox, this is covered in the introduction to the toybox(1) manual page, and there isn't really a distinct cp(1) manual page. (http://landley.net/toybox/help.html)
* For an operating system where this is actually in the cp(1) manual itself, look to Solaris, and its successor Illumos, where you'll find it in the NOTES section. (https://illumos.org/man/1/cp)
That is a far more comprehensive answer than I was expecting!
You're absolutely right about my system. There is a link to the coreutils common options in the cp info page. I missed that when I was searching through it.
Oh neat. I found a problem with a script I wrote today because of this.
I wrote a script recently that takes several screenshots from the same folder, merges them using Image Magick and outputs the merged image into a PDF. Found out through the first tip that I shouldn't be using ls: https://mywiki.wooledge.org/BashPitfalls#for_f_in_.24.28ls_....
One new pitfall I learned yesterday is that sourcing a file searches $PATH. I ran a tool that did ". config" on OpenBSD and it tried to source config(8) instead of its configuration in the local directory. Oops.
Whenever sh is not good enough and I have to switch to bash, I'd rather rewrite it in proper Perl. Bash really looks like a hack, and not all environments have bash. Everybody has a conforming sh.
I was half-expecting a page with the bash grammar. It's super easy to make mistakes in that language. I'm glad there are things like xonsh around trying to act as sane alternatives.
It's meh because like many things, you actually have to read the fine manual if you want stuff to actually work.
It's interesting because it's always nice to have a refresher and/or reminder of various bash things.
That being said... I've said this many times, I'll say it again: bash scripts are software, and thus must not be exempted from various programming best practices. The first thing that comes to my mind is input validation and error checking.
Bash is not magical. It's one of those things that have to be mastered through practice... Just like many other programming languages.
Yea. Or test the heck out of it. Once it works, it tends to work well.
It took me ~8 years (so far, I'm sure there is a bug somewhere) and considerable #bash advice to make a simple script that locks commands+args https://github.com/jakeogh/commandlock
I also wrote a MDA wrapper for gnupg, and again, it took years to feel good about having my mail filter through it https://github.com/jakeogh/gpgmda
Now that py has pathlib, if it's more than a few lines, py with @click is just way better. All that is really missing is cleaner copy/move abstractions.
There are also some features that are missing from all shells... like the ability of a script to know if expansion happened before it got $@... that internal state just isn't exposed.