While you can live a (mostly) happy Linux existence with just GUI tools, I think a strong base working on the command-line can drastically improve your flexibility and efficiency. I urge everyone to spend some time studying all of the great Bash tutorials out there.

I am constantly learning new things. For instance, I had never considered combining multiple commands into a single alias before.
Which leads us to my favorite Bash trick of all time; the single thing I use more than anything else.
Performing actions in a loop over the output of another command.
There are different ways to do this, the most compact being xargs, but I prefer to use while read.
The main reason I like while over xargs is that xargs will die if you’re working with too many items since the entire operation happens on a single command-line action whereas while executes many separate actions.
This will make a whole lot more sense with some examples.
Here’s one I use all the time:
Deleting all .svn directories from a path (handy for making a working copy not be a working copy any more).
find ./ -type d -name ".svn" | while read d; do rm -rf "$d"; done
Let’s break this down a little.
The first thing we’re doing is a simple find command that finds all directories (-type d) named .svn (-name “.svn”) in the current directory.
The next thing, and this is where all of the power of this construct comes from, is that we pipe the output of the find command to a Bash while loop. The while loop iterates over the find output and puts each value into the Bash variable $d (the name of the variable can be anything. I usually stick with something like ‘d’ for directories or ‘f’ for files). Then, for each time through the loop, we delete the directory in the $d variable.
The basic form of the while loop looks like this:
while read d; do <stuff happens here> done
What you do within the while loop is entirely up to you. All of the usual Bash concepts for stringing commands together (using ;, &, or |) still work.
The real power of this idea is that you can do use it with anything that has newline-separated output.
Let’s say you have a file that contains a list of files like this:
foo_file1.txt
foo_file2.txt
bar_file1.txt
foo_file3.txt
foo_file4.txt
bar_file2.txt
Maybe we want to make all files that start with “bar_” world-writable.
egrep "^bar_" filelist.txt | while read f; do chmod a+w "$f"; done
The egrep command gives us all lines from the file that start with “bar_”. That output gets piped to the while loop which does the chmod command.
Now a real world example.
My previous job involved dealing with massive amounts of data from an online survey. Of the many fields entered as part of this survey, one field was the price people paid for something. We had some validation on the price field but, one quarter, a bogus (and very large) price snuck through into the final data. This was a large problem because we generated a lots reports that used the Average Price Paid and this number was throwing them all off. We need to (quickly) go in and update all records.
Unfortunately, this involved checking and updating more tables than a sane person would ever want to deal with by hand.
So it was while loop to the rescue!
Step one is construct a command-line that will loop over all of the tables in a database. We have the egrep call to strip out the first line of the output which isn’t actually a table name. This example just prints out the name of each table.
mysql -B -e "show tables" thedb | egrep -v "^Tables_in_" | while read tbl; do echo "$tbl"; done
Step two does a SELECT on each table and prints all of the ids with an invalid price. Once again, we do a grep to strip out some extra info that mysql is printing out.
mysql -B -e "show tables" thedb | egrep -v "^Tables_in_" | while read tbl; do mysql -B -e "SELECT id FROM $tbl WHERE price >= 10000" | egrep "^[0-9]"; done
That’s the Bash trick I use the most. Remember, it works for anything that has newline-separated output.


Mark, great ways to show the usefulness of loops in bash. Just a suggestion based on your article:
‘find’ has a -exec option, so your command:
find ./ -type d -name ".svn" | while read d; do rm -rf "$d"; done
Could be shortened to
find ./ -type d -name ".svn" -exec rm -rf {} \;
The brackets are replaced with the list of directories generated by the first part of the find command. Surprisingly, in find commands with enormous output, this actually does save a great deal of time as bash doesn’t need to loop to handle any of the output itself.
Posted by Sean on May 14th, 2009.
Using -exec is a great shorthand. Thanks for mentioning it!
However (while I don’t remember the exact length) there comes a point where a command-line becomes too long and throws an error and I thought (and please correct me if I’m wrong) that -exec and xargs were both susceptible to hitting this limit over large output sets.
Posted by mark on May 14th, 2009.
With regard to find, I don’t believe so. find … -exec essentially loops itself. It executes the command after -exec with each found “file” individually.
I believe you are right, from what I remember (and a quick Google search), about xargs. However, if you throw it a ‘-l’ (or maybe now ‘-L’, as the lowercase ‘l’ seems to be a deprecated option these days — still works, though) it handles the limitation better. If you do that, xargs will only handle a certain number of command line options (defaults to one), and then will process the remaining output with separate calls to the command specified by xargs.
Redirection is a pretty cool thing too: http://docstore.mik.ua/orelly/unix/upt/ch09_21.htm
Regardless, the command line argument length is handled by the ‘execve’ system call, I believe. However, when we start talking about such things, I get lost. I’m just a sysadmin, not a kernel developer
.
Posted by Sean on May 14th, 2009.
That’s great info. Thanks for passing it along.
I’ll have to do some experimenting with -exec. I’m a big fan of anything that lets me type less.
Posted by mark on May 14th, 2009.
Here’s a question.
Is it possible to perform multiple commands with -exec? Or have a -exec pipe to another find -exec?
Posted by mark on May 14th, 2009.
Well, you could always nest them…
I doubt that piping one find to another would work just because of command construction problems.
find . -type d -exec `find {} -exec someothercommand \;' \;
I would assume something like that should work. If I could come up with a use, I’d have tested it.
Also, remember a pipe redirects the previous command’s STDOUT to the next command’s STDIN. Some commands which typically write to a file can write to STDOUT, too. A fun use of this is tunneling something via ssh. For example:
The latency associated with scp’ing a whole slew of files is ridiculous. If you’re doing thousands of files, scp individually encrypts each file, instead of encrypting the entire stream. To get around that, I typically do something fun like this:
tar -cpf - * | ssh -q someotherhost "tar -xp -C /thedirectorywhereyouwannaputit/."
This will encrypt the entire stream, which helps with the latency.
It gets even more fun when you have to do that through an intermediary host. I can’t remember how to structure that one properly. Gotta love networks you’re not allowed to route to via conventional means.
Posted by Sean on May 14th, 2009.
find ./ -type d -name “.svn” | while read d; do rm -rf “$d”; done
can be done a lot more efficient when written like:
find ./ -type d -name “.svn” -print0 | xargs -0 rm -rf
This will only invoke rm once with a long argument list which will be a lot faster than running rm for each directory.
Posted by Tom on November 20th, 2009.
Another nice tip. I like it when people can show me more compact ways to do these kinds of things.
Posted by mark on November 20th, 2009.
Great article. However, there is one caveat with bash, while loop and pipes. Let’s have a look at the following example:
ERROR_COUNT=0
ifconfig | awk ‘/inet addr/ && $2 !~ /addr:127.0.0.1/ { print substr($2, 6) }’ | while read IP; do
RES=`host $IP`
RETVAL=$?
if [ $RETVAL -gt 0 ]; then
ERROR_COUNT=$((ERROR_COUNT+1))
echo “ERROR $RES”
else
echo “OK $RES”
fi
done
exit $ERROR_COUNT
You’d expect it to exit with non-zero in case of errors. However, this is not the case since bash will perform operations within sub-shells, and variables are specific to whatever sub-shell they’re called in. So, once we leave the do/while loop, $ERROR_COUNT will again evaluate to 0.
Any ideas, how to fix it?
Posted by Alex J on July 6th, 2010.
Interesting question. I wonder if it would work to use “export” to create ERROR_COUNT as an environment variable.
Posted by mark on July 7th, 2010.