Processing files with spaces in filenames
Sometimes I get files from friends who use certain graphical operating systems, where it’s ok to use spaces in filenames. Processing these files on Unix isn’t that much fun since spaces separate commands and options on the command line. Thus it sucks to try and process a list of files and get the output:
My: No such file or directory
Cool: No such file or directory
File.txt: No such file or directory
when really the file My Cool File.txt
should have been read and processed.
How to get around this? My usual procedure is simply to convert the spaces
to underscores by hand:
$ mv My\ Cool\ File.txt My_Cool_File.txt
This is not only slow, but also error-prone, since one needs to remember to
escape the spaces, however for one-off instances this is a fast, pragmatic
option. However, if there is a long list of such files this is amazingly
tedious. So, for once I decided to sit down and try to work out how to
convert the spaces to underscores with a for
loop in bash
.
Internal Field Separator to the rescue
After spending ages trying to work out how to e.g. pipe the output of ls
into a bash
array, or use some of the various quoting options to ls
1,
I eventually stumbled across the IFS
environment variable.
IFS
stands for “Internal Field Separator” 2, and is how things like for
separate their
output fields. Normally the field separator for for
is a space, thus
filenames with spaces in them get split into multiple parts and hence
commands can’t find a file called My
, Cool
or File.txt
.
One solution is to set IFS
to a new value, process the files and then set
IFS
back to its old value. This tip I got from the blog post about
handling filenames with spaces in
bash
and I thought I’d share my take on what is effectively an already solved
problem. Why? Well, maybe the next time I’ll be able to find the solution
quicker and by writing about it I might remember it instead of having to
rely on StackOverflow and Google.
A fictitious but realistic example
Let’s imagine that you’ve been given the following list of files:
A Tale of Two Cities.txt
Beowulf.txt
Pride and Prejudice.txt
The Adventures of Tom Sawyer.txt
The Count of Monte Cristo.txt
The Importance of Being Earnest - A Trivial Comedy for Serious People.txt
Let’s also imagine that in order to do simple things like mv
, cp
or even
for name in $(ls *.txt) do; aspell -c $name; done
, that first you’d like
to convert all spaces in the filenames to underscores. Also, you’d like
filenames using the sequence “space-hyphen-space” to be converted to a
simple underscore. To achieve this, you’d use a loop something like the
following:
1
2
3
4
5
6
7
8
SAVEIFS=$IFS
IFS=$(echo -en "\n\b") # or $'\n\b'
for file in $(ls)
do
new_file=$(echo $file | sed 's/ - /_/g' | sed 's/ /_/g')
mv $file $new_file
done
IFS=$SAVEIFS
This code needs a bit of explaining:
- line 1: the current value of
IFS
is saved inSAVEIFS
so that we can set it back later. - line 2:
IFS
is set to the result of echoing (echo
) the sequence of a newline character followed by a backspace ("\n\b"
). This sequence is of backslash escaped characters is interpreted byecho
via the switch-e
; the-n
switch supresses the trailing newline whichecho
would normally append to its output. Exactly why this sequence is used, I’m not sure, and no reason was mentioned in the original blog post. A plain newline is not sufficient to makeIFS
do what we want, however perhaps the combination is sufficiently uncommon to allow it to be used as a field separator in basically any situation. The main point here is that it works. Note also that the sequence$'\n\b'
would also work and doesn’t requireecho
to run. - line 3: loop over files in the current directory, selected by a simple
ls
. The$()
aroundls
executes the command and is considered better practice than using backticks (which one would have used in yesteryear). - line 5: the new filename is determined by substituting
“space-hyphen-space” and “space” in the original filename by underscores
(the
g
means globally, so the substitution happens for all occurrences of the pattern within the filename). - line 6: the old filename is replaced with the new filename.
- line 8:
IFS
is reset to its original value (now weird things won’t happen in later scripts or processing withing the current shell session).
Running the code gives us the output we want:
$ ls
A_Tale_of_Two_Cities.txt
Beowulf.txt
Pride_and_Prejudice.txt
The_Adventures_of_Tom_Sawyer.txt
The_Count_of_Monte_Cristo.txt
The_Importance_of_Being_Earnest_A_Trivial_Comedy_for_Serious_People.txt
Note that mv
will complain that Beowulf.txt
is the same file as you’re
trying to rename a file to the same name; this isn’t a problem, but you will
get a warning, just so you know.
Voila!
And that’s it! Now it’s possible to process the files more easily since
they don’t contain spaces, however if one wished, it would be possible to
process them even with the spaces using the command line and knowledge of
the IFS
environment variable.
[Update: use globbing instead of ls
]
It turns out the solution is much easier than as described above. The day
after I wrote the above text, a link to the Bash Pitfalls
page turned up in my Twitter feed.
The first entry on that page explains that it’s not at all necessary to use
the output of ls
as input to a for
loop (and in fact that it’s a really
bad idea, since the filenames could contain spaces!). The solution is to
match the filename pattern we’re interested in with a glob (such as *.txt
)
and then to quote the expansion when using it within a command. This means
we don’t need to muck around with internal field separators at all, which
can only be a good thing. Thus, the shell code we need to write looks like
this:
1
2
3
4
5
for file in *.txt
do
new_file=$(echo "$file" | sed 's/ - /_/g' | sed 's/ /_/g')
mv "$file" $new_file
done
Notice the improved for
statement on line 1 (and the lack of
environment variable setup) and the quotes around the shell variable
expansion on lines 3 and 4. This is shorter, simpler and easier to read and
understand, which is a definite improvement.
-
I learned that the
-Q
and--quoting-style
options could one day come in handy, for instancels --quoting-style=escape
escapes special characters in filenames, but doesn’t help us here, unfortunately, andls -Q
puts quotes around the filenames; it’s amazing what readingman
pages can bring sometimes. ↩ -
You can look it up in the
bash
man
page, although it wasn’t obvious how to solve the current problem from the manual text. ↩