Friday, November 11, 2011

How to Improve Performance of Shell Scripts

Shell scripts in Linux/Unix have the potential to do a lot of work. A lot of times they will start out as just a couple of lines written as a convenience to avoid typing in a few commands repeatedly. Then a couple more commands are added, then a couple more, and eventually you have a huge script that sets up your entire network, checks your email, makes you pancakes, and walks your dog.

One doesn't usually think too hard about the performance of shell scripts. But if you scripts do a lot of work, chances are you can make them run several time faster!
The thing with traditional shell scripts is that they usually are made up of many calls to external utilities that do very specific things, like 'sed', 'grep', 'awk', 'cat', 'dog', 'head', 'tail', 'strings', and even 'perl'. It usually doesn't matter, because they're just running a couple of commands together as a convenience. So, not too many people think about the performance impact of using different unix utilities, and piping output to and from them. I think a lot of people will be surprised at how expensive it is.

The following is done in the traditional way of piping out put to unix utilities in order to do processing.

File: pipes.sh
c=0
for f in /dev/* ; do
    group=$(ls -l $f | cut -d ' ' -f 4)
    if [[ $group == audio ]] ; then
        ((c+=1))
    fi
done
echo $c audio devices

Produced the following output:
$ time ./pipes.sh
7 audio devices

real    0m7.307s
user    0m2.736s
sys     0m4.364s

Note that for each file in the /dev directory, 2 executables were run with 1 pipe.

So, about 7 seconds. OK, well, I'm not really sure how slow that is yet, because I don't have much to compare it to. It seems fine, right? It's not really that long!

Well, here's the same functionality with only a single external command and pipe:
c=0
ls -l /dev | { while read -a line ; do
        group=${line[3]}
        if [[ $group == audio ]] ; then
            ((c+=1))
        fi
    done
    echo $c audio devices
}

And the results:
$ time ./nopipes.sh
7 audio devices

real    0m0.131s
user    0m0.076s
sys     0m0.044s

I would just like to point out that the version using pipes took 55 times longer! Using pipes to external utilities created a performance degradation of 5,500%!!!

The lesson is, doing as much work as possible in the local process, using only the scripting language itself, can drastically improve the performance of your scripts.

No comments: