Bash-ing our way to a project’s base directory

5 minute read

Do you want your script to be told where it needs to run? Or would you rather it work things out for itself? Personally, I find the more a script can work out about its runtime environment on its own, the better.

The goal: simplify crontab entries

A classic example of this kind of thing that I have at work is wrapper scripts for cron jobs. A lot of our software at work uses Python and in order that the dependencies of one project don’t influence the dependencies of other projects, we use virtual environments to keep things nicely separated. Unfortunately, this can make commands in the crontab entries long and complex because we need to change the path to the relevant directory, activate the virtual environment and then run the program. For instance:

0 * * * * flock -n /tmp/some-prog.lock -c "cd /path/to/project/base/dir; . venv/bin/activate && python some-prog.py"

To make things simpler, I tend to wrap this process into a script. Something like this:

0 * * * * flock -n /tmp/some-prog.lock -c "/path/to/project/base/dir/run-prog"

That’s shorter, easier to read when looking at the crontab, and makes changing how the final program is run easier because this information is explicitly defined in a wrapper script which itself is in source code control within the relevant project. Unfortunately, that means the script needs to work out which path to use to run the main program on its own.

One solution: Perl’s FindBin module

My go-to solution in the past was to use Perl’s FindBin module; i.e. something along these lines:

#!/usr/bin/env perl

use strict;
use warnings;

use FindBin qw($RealBin);

my $base_dir = "$RealBin/..";
chdir $base_dir or die "$!";

die "Python virtualenv not set up, exiting" unless -d "venv";

my $venv_setup = ". venv/bin/activate";
my $command = "$venv_setup; python some-command-or-other.py";

my $retval = system $command;
if ( $retval != 0 ) {
    warn "command execution failed; $!";
}

(Yes, my Perl could be better; I’m a bit rusty, ok?)

There are problems with this solution though: most of my colleagues don’t know Perl and the FindBin module isn’t part of the base Perl installation on Debian, so we have to ensure all production nodes have this extra dependency1. One day my brain said “there must be another way of doing this” and so I went looking, and sure enough, it’s possible in plain bash. This means I can remove the FindBin dependency and I can reduce the number of languages we use at work by one which is definitely a help as it reduces the potential cognitive load for the whole team.

A bit simpler: all in bash

Eventually, I landed on a solution adapted from a well-written answer about BASH_SOURCE on StackOverflow and it looks basically like this:

#!/bin/bash

BIN_DIR=$(dirname "${BASH_SOURCE[0]}")
BASE_DIR=$(dirname "$BIN_DIR")
cd "$BASE_DIR" || exit

if [ ! -d venv ]
then
    virtualenv --python=/usr/bin/python3 venv
fi

# shellcheck source=/dev/null  # don't check venv activate script
source venv/bin/activate
python some-command-or-other.py

(Yes, we do use shellcheck for our bash scripts at work. It’s great to have a linter for shell scripts!)

I’m also vaguely sure it’s possible to remove all knowledge of where the wrapper script is on the filesystem (possibly by putting a symlink or similar into /usr/local/bin), however the above solution works sufficiently well for my purposes at present.

So what’s going on here? The short answer is that the first element of the BASH_SOURCE array is the path to the wrapper script and we can use that information to put us in the right place to run the main program.

The main point is that BASH_SOURCE does what FindBin also does, because BASH_SOURCE contains path information about the script. Thus we’re free to call the script from wherever we want: the runtime path of the wrapper script is now wonderfully irrelevant for the main program to run.

Note that since ${BASH_SOURCE[0]} points to the name of the script, but we want to know the directory in which it resides, we call dirname on the value to get its directory name. In the particular case shown above, my wrapper scripts are located within the project’s bin/ directory which is one level down from the project’s base directory, therefore we call dirname once more to get the name of the project’s base directory. After that it’s a simple matter of creating the virtual environment (if necessary), activating the virtual environment and then running the main program.

To understand things more completely–and not just to rely on StackOverflow for all explanations–I like going back to the original documentation to get a good feeling for how to use a new concept, however I found the bash variables docs (where the BASH_SOURCE variable is mentioned) not to be overly clear. I found that the trick is to read the docs for the FUNCNAME variable to get the gist of what the BASH_SOURCE documentation is talking about. In the end, the sentence:

The element with index 0 is the name of any currently-executing shell function.

is the one that helps us the most: paraphrasing the docs somewhat, this means that, in the context of BASH_SOURCE, the element with index 0 is the name of the currently-executing script.2 This is why we use the first element of the BASH_SOURCE array to locate the script’s directory.

There’s always more than one way to do it

It turns out it’s possible to simply use the $0 variable to get the script’s path information, however this doesn’t work in all scenarios; in particular when the script is being sourced as opposed to being run. Therefore, using the slightly longer ${BASH_SOURCE[0]} invocation is the safer option.

I recently stumbled across jessitron’s solution (${BASH_SOURCE%/*} is the directory containing the script) which is cool (and I learned something new about bash when reading this solution) however I find it very dense on information which can make reading the code hard for people who aren’t aware of how e.g. % and /* interact in this context. In the end, just using dirname on the first element of the BASH_SOURCE array is clear as well as being nice and direct.

Why not use Python? After all, the main code is in Python, surely one would use Python for the wrapper too. Well, running a virtualenv Python from within system Python can get confusing, so it makes more sense (to me, at least) to use a separate language to reduce potential confusion. Different things should look different after all.

Wrapping up

And that’s it! Just use ${BASH_SOURCE[0]} to work out where you are and control where you want to be.

  1. It’s not like it’s hard to install, it’s just one more thing to think of when creating the Ansible configuration for a server. 

  2. There’s a gotcha here though, that ${BASH_SOURCE[0]} could be empty if no named file is involved

Support

If you liked this post and want to see more like this, please buy me a coffee!

buy me a coffee logo