2006-08-19

Why Program?

Hopefully this will be the first of several articles I’ll be writing for
Cache. I’ve been writing software since around 1983 and started writing
professionally since around 1989. I’ve written in at least 10 different
languages including binary code (writing 8bit Z80 machine instructions without
an assembler), 8,16 & 32bit assembler, popular languages such as C/C++, BASIC
and Java, scripting languages such as ruby and the various UNIX scripting
languages, “teaching” languages such as Pascal as well as more exotic
languages such as lisp and occam (a language for writing parallel programs).
Along the way I’ve learned a lot (I have made enough mistakes to give me
plenty of learning experiences!) and with any luck some of what I’ve got say
will help you too. I will try not to get too technical but stay technical
enough to actually say something more than vague generalities.

In this first article though, I would like to ask a question which may seem to
have an obvious answer. Why program? Or rather, why should you program? Now,
obviously someone needs to write software or we’d be deprived of such
uplifting experiences as watching MS Word crash before we’ve saved our work or
enjoying the visitations of the Windows Blue Screen Of Death (BSOD). Also a
fair number of people, myself included, think writing software is an excellent
way to earn money. However, I’m going to argue that almost anyone who uses a
computer could benefit from having at least some programming skills. I’m going
to try a simple analogy to illustrate why.

Yoda provides an incentive to program

Let’s suppose that you’d like an action figure of Yoda from Star Wars (don’t
worry you don’t have to tell me why you’d want such a thing). If you can’t
program it’s like arriving at the store and find you can have any action
figure you want as long as its Darth Vader, Luke Skywalker or Obi wan Kenobi.

But never mind programming, there’s a lot of “no programming required” tools
out there right? Well, most of these are template based and that’s like
settling for a mold of Darth Vader. Armed with your mold you can make lots of
plaster copies of Darth Vader and you can even choose what colours to paint
them but once you’ve made and painted your figure it’s hard to change and even
though you can buy Luke Skywalker and Obi wan Kenobi molds a Yoda one is still
not available.

Now let’s say you want that Yoda figure enough to be willing to put in a bit
of effort. You buy a kit of “super Lego” - this is really cool because not
only does it come with lots of pieces that are ideal for making a Yoda figure
but with a bit of effort you can even make new pieces. You end up with a Yoda
figure than can not only stand up on its own but can even bash the Darth Vader
plaster cast on the head with a light sabre. OK, making Yoda this way took a
lot longer than buying one and your Yoda figure looks a little rough around
the edges compared something mass produced but even if you’d been able to buy
a Yoda figure that could do all yours can it would cost you a lot more than
what you’d paid for your “super Lego.”

In fact a modern programming environment is a lot like our mythical “super
Lego” - the pieces are like the libraries that come with languages such as
Java, python, ruby, C++ etc and the language itself allows you to make new
“pieces” which are indistinguishable from the libraries that came with the
language.

If you knew that you could get a kit of “super Lego” for free would you be
tempted to give up your Darth Vader plaster casts and spend a little time
learning how to use it?

A real life program

Let’s leave Yoda alone for the moment and move on to something a little more
like programming. Even if you don’t want or need to program “in the large”
(write a reasonable complex game, an alternative to MS Word or a web app to
compete with Amazon.com) I find that there are plenty of times when it’s good
to know how to program “in the small.” It’s not glamorous, it (probably) won’t
earn you mountains of cash but it will let you get your computer to do a bunch
of tasks that you’d have to do manually or otherwise not be able to do at all.

You want examples? Just before writing this article I was putting together a
set of images to test some software I’ve been working on. However, they all
had names like “dave picking his nose.jpg” and “george bush cartoon.gif” and
being the neatness freak that I am I wanted them to have nice convenient names
like “test1.jpg” “test2.gif” etc. So, I have a choice:

I can rename all the files manually, if there were 10 maybe I’d do it but there are several hundred. I don’t care that much. If I didn’t know how to program the file renaming probably wouldn’t get done right?
I write a little script that gets my computer to do all the work.

Here’s the script (I’ll explain what it does in a moment):

fogey% ls -1 *.jpg | grep -v method_ | awk 'BEGIN { count=1 } { printf("mv %s test%d.jpg\n", $0, count++); }' > jpgfiles.sh
fogey% bash jpgfiles.sh

Time to write it: about 30 seconds. Time taken to run it: < 1 second. Is it
pretty? Not really? Is there a more elegant way I could have done it?
Probably. However, my goal is not to show you the most elegant solution
(assuming one exists) but to show you what small, “throw away” programs are
like. You don’t care if they’re elegant because you might only use them once.
You do care about writing it quickly and getting the job done; after all if I
wanted to spend hours renaming files I might just as well do it manually.

So, when programming “in the small” like this I’d argue the rules are
different from what you might have been taught. If you are writing pieces of
software that you want to keep or are more than a few 10s of lines long your
goals should be:

Make the program correct.
Make the program maintainable.
Make the program efficient enough to be useful. Note that efficiency came third on a list of three items. Firstly, for the program to be useful it has to be correct - I won’t care if it runs fast if it doesn’t do what I want. Secondly, you are likely to spend more time modifying and adapting any program that you keep for a while than you did writing it in the first place so it’s important that the structure is clear enough for you to be able to understand it and change it six months after you wrote it. Many inexperienced programmers worry both too much and too little about efficiency. Too much, because they worry about speed before correctness. Too little, because they make poor choices of data structure and algorithm that mean that efficiency is doomed from the start.

When you’re programming “in the small” the rules are different:

Make the program correct.
Write it quickly Even small programs have to be correct in order to be useful. However, the goal when writing tiny scripts like the one above is not to produce a work of art that you’ll treasure for the rest of your life but to get the computer to save you time so you can get back to doing the thing you actually want to do (which may or may not involve programming).

Summary

So, to summarize I think knowing how to program in the small is very useful
for a number of reasons: It can save you a lot of time even if you don’t care
about writing large programs It’s a great way of experimenting and trying
things that will help you learn how to write larger programs. The programs are
small and quick to write so easy to understand and you get (almost) instant
feedback - you don’t have to invest hours of work before you know if you’re on
the right track or not.

There are some excellent free tools available for small and large programming
on Windows, MacOS and Linux so you don’t even have to spent any money - the
only thing you have to spend is a bit of your time.

Assuming, you want to hear more; next time I’ll talk a bit about some of the
free tools I mentioned above and show you how to use them to do useful stuff.
In the meantime if you have any comments and questions I’d be happy to hear
them at mailbag@emptiness.org If you have any
examples of small tasks that you’d like to know how to automate send me an
email and let me know what platform (Windows, Mac or Linux) you’re using and
if I have time I’ll discuss them in subsequent articles.

The script decoded

Oh yes, the script. I hadn’t forgotten. Now the first thing to realise is that
the “script” as I showed it is actually not a single program but a series of
commands that generate a script and place it in the file jpgfiles.sh and then
a command to run it. fogey% is the command prompt of my terminal showing that
it’s waiting for input (fogey is the name of the machine) - since you see the
prompt twice this is a dead giveaway that I’m actually entering two commands.
Now since the first line generates a script in jpgfiles.sh I could always open
that file in an editor to check it is right before I run in - so I have a
built in safety mechanism. The first line is actually three commands “piped”
together using “|” which takes the output of one command and sends it to the
input of another so we can break this down into four parts:

ls -1 *.jpg : gives a list of all files ending in .jpg one per line
grep -v method__ : removes all files with “method“ in their names from the list
_awk ‘BEGIN count=1 printf(“mv %s test%d.jpg\n”, $0, count++); ‘ > jpgfiles.sh_ : uses a simple language called “awk” to take each line and change it from something like “filename.jpg” to “mv filename.jpg test1.jpg”. “mv” is the UNIX command to rename a file so the generated script has one line for each file which uses the mv (rename) command to change the name of that file. The “> jpgfiles.sh” means “take the output of the command and put it in the file jpgfiles.sh”. I won’t go into more detail on awk here, since I’m running out of space but I might say more in another article.
bash jpgfiles.sh : bash is one of many UNIX command interpreters. I use bash to run all the rename commands in the jpgfiles.sh

The above illustrates another feature of programming “in the small” - rather
than having a single program in a single language it’s common to use a bunch
of small commands and maybe even different “small” languages and join them
together to get the job done. This is often much quicker (to write) than
trying to do everything using a single command or language.

If you’re using Linux or MacOS X then you having everything you need to try
this script already. If you’re running Windows install
Cygwin and you’ll be able to run these UNIX commands
too.

If you’re impatient to understand awk take a look at
http://www.vectorsite.net/tsawk.html