Processing files with spaces in filenames
Sometimes I get files from friends who use certain graphical operating systems, where it’s ok to use spaces in filenames. Processing these files on Unix isn’t that much fun since spaces separate commands and options on the command line. Thus it sucks to try and process a list of files and get the output:
when really the file
My Cool File.txt should have been read and processed.
How to get around this? My usual procedure is simply to convert the spaces
to underscores by hand:
This is not only slow, but also error-prone, since one needs to remember to
escape the spaces, however for one-off instances this is a fast, pragmatic
option. However, if there is a long list of such files this is amazingly
tedious. So, for once I decided to sit down and try to work out how to
convert the spaces to underscores with a
for loop in
Internal Field Separator to the rescue
After spending ages trying to work out how to e.g. pipe the output of
bash array, or use some of the various quoting options to
I eventually stumbled across the
IFS environment variable.
IFS stands for “Internal Field Separator” 2, and is how things like
for separate their
output fields. Normally the field separator for
for is a space, thus
filenames with spaces in them get split into multiple parts and hence
commands can’t find a file called
One solution is to set
IFS to a new value, process the files and then set
IFS back to its old value. This tip I got from the blog post about
handling filenames with spaces in
and I thought I’d share my take on what is effectively an already solved
problem. Why? Well, maybe the next time I’ll be able to find the solution
quicker and by writing about it I might remember it instead of having to
rely on StackOverflow and Google.
A fictitious but realistic example
Let’s imagine that you’ve been given the following list of files:
Let’s also imagine that in order to do simple things like
cp or even
for name in $(ls *.txt) do; aspell -c $name; done, that first you’d like
to convert all spaces in the filenames to underscores. Also, you’d like
filenames using the sequence “space-hyphen-space” to be converted to a
simple underscore. To achieve this, you’d use a loop something like the
1 2 3 4 5 6 7 8 SAVEIFS=$IFS IFS=$(echo -en "\n\b") # or $'\n\b' for file in $(ls) do new_file=$(echo $file | sed 's/ - /_/g' | sed 's/ /_/g') mv $file $new_file done IFS=$SAVEIFS
This code needs a bit of explaining:
- line 1: the current value of
IFSis saved in
SAVEIFSso that we can set it back later.
- line 2:
IFSis set to the result of echoing (
echo) the sequence of a newline character followed by a backspace (
"\n\b"). This sequence is of backslash escaped characters is interpreted by
echovia the switch
-nswitch supresses the trailing newline which
echowould normally append to its output. Exactly why this sequence is used, I’m not sure, and no reason was mentioned in the original blog post. A plain newline is not sufficient to make
IFSdo what we want, however perhaps the combination is sufficiently uncommon to allow it to be used as a field separator in basically any situation. The main point here is that it works. Note also that the sequence
$'\n\b'would also work and doesn’t require
- line 3: loop over files in the current directory, selected by a simple
lsexecutes the command and is considered better practice than using backticks (which one would have used in yesteryear).
- line 5: the new filename is determined by substituting
“space-hyphen-space” and “space” in the original filename by underscores
gmeans globally, so the substitution happens for all occurrences of the pattern within the filename).
- line 6: the old filename is replaced with the new filename.
- line 8:
IFSis reset to its original value (now weird things won’t happen in later scripts or processing withing the current shell session).
Running the code gives us the output we want:
mv will complain that
Beowulf.txt is the same file as you’re
trying to rename a file to the same name; this isn’t a problem, but you will
get a warning, just so you know.
And that’s it! Now it’s possible to process the files more easily since
they don’t contain spaces, however if one wished, it would be possible to
process them even with the spaces using the command line and knowledge of
IFS environment variable.
[Update: use globbing instead of
It turns out the solution is much easier than as described above. The day
after I wrote the above text, a link to the Bash Pitfalls
page turned up in my Twitter feed.
The first entry on that page explains that it’s not at all necessary to use
the output of
ls as input to a
for loop (and in fact that it’s a really
bad idea, since the filenames could contain spaces!). The solution is to
match the filename pattern we’re interested in with a glob (such as
and then to quote the expansion when using it within a command. This means
we don’t need to muck around with internal field separators at all, which
can only be a good thing. Thus, the shell code we need to write looks like
1 2 3 4 5 for file in *.txt do new_file=$(echo "$file" | sed 's/ - /_/g' | sed 's/ /_/g') mv "$file" $new_file done
Notice the improved
for statement on line 1 (and the lack of
environment variable setup) and the quotes around the shell variable
expansion on lines 3 and 4. This is shorter, simpler and easier to read and
understand, which is a definite improvement.
I learned that the
--quoting-styleoptions could one day come in handy, for instance
ls --quoting-style=escapeescapes special characters in filenames, but doesn’t help us here, unfortunately, and
ls -Qputs quotes around the filenames; it’s amazing what reading
manpages can bring sometimes. ↩
You can look it up in the
manpage, although it wasn’t obvious how to solve the current problem from the manual text. ↩