Bash-ing our way to a project’s base directory
Do you want your script to be told where it needs to run? Or would you rather it work things out for itself? Personally, I find the more a script can work out about its runtime environment on its own, the better.
The goal: simplify crontab entries
A classic example of this kind of thing that I have at
work is wrapper scripts for cron
jobs. A lot of
our software at work uses Python and in order
that the dependencies of one project don’t influence the dependencies of
other projects, we use virtual
environments to keep things
nicely separated. Unfortunately, this can make commands in the crontab
entries long and complex because we need to change the path to the relevant
directory, activate the virtual environment and then run the program. For
instance:
0 * * * * flock -n /tmp/some-prog.lock -c "cd /path/to/project/base/dir; . venv/bin/activate && python some-prog.py"
To make things simpler, I tend to wrap this process into a script. Something like this:
0 * * * * flock -n /tmp/some-prog.lock -c "/path/to/project/base/dir/run-prog"
That’s shorter, easier to read when looking at the crontab, and makes changing how the final program is run easier because this information is explicitly defined in a wrapper script which itself is in source code control within the relevant project. Unfortunately, that means the script needs to work out which path to use to run the main program on its own.
One solution: Perl’s FindBin
module
My go-to solution in the past was to use Perl’s
FindBin
module; i.e. something along
these lines:
#!/usr/bin/env perl
use strict;
use warnings;
use FindBin qw($RealBin);
my $base_dir = "$RealBin/..";
chdir $base_dir or die "$!";
die "Python virtualenv not set up, exiting" unless -d "venv";
my $venv_setup = ". venv/bin/activate";
my $command = "$venv_setup; python some-command-or-other.py";
my $retval = system $command;
if ( $retval != 0 ) {
warn "command execution failed; $!";
}
(Yes, my Perl could be better; I’m a bit rusty, ok?)
There are problems with this solution though: most of my colleagues don’t
know Perl and the FindBin
module isn’t part of the base Perl installation
on Debian, so we have to ensure all production nodes have this extra
dependency1. One day my brain said “there must
be another way of doing this” and so I went looking, and sure enough, it’s
possible in plain bash. This means I can remove the FindBin
dependency
and I can reduce the number of languages we use at work by one which is
definitely a help as it reduces the potential cognitive load for the whole
team.
A bit simpler: all in bash
Eventually, I landed on a solution adapted from a well-written answer about
BASH_SOURCE
on
StackOverflow and it looks
basically like this:
#!/bin/bash
BIN_DIR=$(dirname "${BASH_SOURCE[0]}")
BASE_DIR=$(dirname "$BIN_DIR")
cd "$BASE_DIR" || exit
if [ ! -d venv ]
then
virtualenv --python=/usr/bin/python3 venv
fi
# shellcheck source=/dev/null # don't check venv activate script
source venv/bin/activate
python some-command-or-other.py
(Yes, we do use shellcheck
for
our bash scripts at work. It’s great to have a linter for shell scripts!)
I’m also vaguely sure it’s possible to remove all knowledge of where the
wrapper script is on the filesystem (possibly by putting a symlink or
similar into /usr/local/bin
), however the above solution works
sufficiently well for my purposes at present.
So what’s going on here? The short answer is that the first element of the
BASH_SOURCE
array is the path to the wrapper script and we can use that
information to put us in the right place to run the main program.
The main point is that BASH_SOURCE
does what FindBin
also does, because
BASH_SOURCE
contains path information about the script. Thus we’re free
to call the script from wherever we want: the runtime path of the wrapper
script is now wonderfully irrelevant for the main program to run.
Note that since ${BASH_SOURCE[0]}
points to the name of the script, but we
want to know the directory in which it resides, we call dirname
on the
value to get its directory name. In the particular case shown above, my
wrapper scripts are located within the project’s bin/
directory which is
one level down from the project’s base directory, therefore we call
dirname
once more to get the name of the project’s base directory. After
that it’s a simple matter of creating the virtual environment (if
necessary), activating the virtual environment and then running the main
program.
To understand things more completely–and not just to rely on StackOverflow
for all explanations–I like going back to the original documentation to get
a good feeling for how to use a new concept, however I found the bash
variables
docs
(where the BASH_SOURCE
variable is mentioned) not to be overly clear. I
found that the trick is to read the docs for the FUNCNAME
variable to get
the gist of what the BASH_SOURCE
documentation is talking about. In the
end, the sentence:
The element with index 0 is the name of any currently-executing shell function.
is the one that helps us the most: paraphrasing the docs somewhat, this
means that, in the context of BASH_SOURCE
, the element with index 0 is the
name of the currently-executing script.2 This is why we
use the first element of the BASH_SOURCE
array to locate the script’s
directory.
There’s always more than one way to do it
It turns out it’s possible to simply use the $0
variable to get the
script’s path information, however this doesn’t work in all scenarios; in
particular when the script is being sourced as opposed to being run.
Therefore, using the slightly longer ${BASH_SOURCE[0]}
invocation is the
safer option.
I recently stumbled across jessitron’s
solution
(${BASH_SOURCE%/*}
is the directory containing the script) which is cool
(and I learned something new about bash when reading this solution) however I
find it very dense on information which can make reading the code hard for
people who aren’t aware of how e.g. %
and /*
interact in this context.
In the end, just using dirname
on the first element of the BASH_SOURCE
array is clear as well as being nice and direct.
Why not use Python? After all, the main code is in Python, surely one would use Python for the wrapper too. Well, running a virtualenv Python from within system Python can get confusing, so it makes more sense (to me, at least) to use a separate language to reduce potential confusion. Different things should look different after all.
Wrapping up
And that’s it! Just use ${BASH_SOURCE[0]}
to work out where you are and
control where you want to be.
-
It’s not like it’s hard to install, it’s just one more thing to think of when creating the Ansible configuration for a server. ↩
-
There’s a gotcha here though, that
${BASH_SOURCE[0]}
could be empty if no named file is involved. ↩
Support
If you liked this post and want to see more, please buy me a coffee!
