, , ,

In the Unix world you sometimes hear someone mention an “AWK hammer” or an “AWK nail” — usually in reference to an unexpected, possibly suspect, way of using some tool. In that history repeats itself, in the corporate world, one might have (but never did) hear reference to a “Lotus 1-2-3 hammer.”

The implication is that someone has fallen in love with a particular tool and is using it everywhere. In particular it applies to a situation where using that beloved tool may not have been the ideal choice.

Which is not to say it implies the choice was bad. Generally the unexpected way of using the beloved tool works. It’s just not the choice an average worker most likely would have used given a broad selection of tools.

It’s just unexpected in a way that makes one think, “Ah, ha! Someone really likes that there hammer they’ve got!”

The references come from the immortal phrase, “When you’ve got an AWK hammer, everything looks like an AWK nail.”

For those who never swam in Unix-y waters, AWK — it must be admitted — was a pretty awesome tool. It’s easy to see why someone would think, especially on first discovering it, that it was like wearing black: goes with everything.

Read the Wiki article (or the AWK man page) for more details, but the main thing is how differently AWK works from most programming languages.

It’s based around the idea of processing lines of input text. A program consists of a series of clauses, each with the form:

condition { action }

The AWK engine scans each line of the input text and tests it against the condition of each clause. If the condition causes a match, the AWK engine does the action.

The default condition (if none is specified) is to match the current input line. The default output action is to print the line. So an empty AWK program just prints the input text.

There is a lot of power in the condition part, which can be a logical expression or a regular expression. (One of the things that’s cool about AWK is that regular expressions are built-in.) There are also special BEGIN and END conditions performed, respectively, first and last.

In fact, a big clue that someone is using AWK where something else might be better is when there’s a lot of code in the BEGIN or END clause but few (or no) conditions to be matched.

But it’s just a clue. The AWK engine parses the input lines into parts easily accessed by AWK code (as $1, $2,…). It also has handy built-in NR and NF variables that provide the current Number of Records and Number of Fields (of the current line).

As such, you can use AWK to easily count the number of words in a text file:

{ words += NF }
END { print NR, words }

The first clause has no condition, so it matches all lines. AWK allows variables to be created by using them, so words (which first time through defaults to zero) accumulates the total Number of Fields (words).

The second clause  uses the END condition, so its action executes after all input lines have been processed. That action just prints the total Number of Records (lines) and total number of words.

In any event, you can perhaps begin to see that AWK is an unusual, but powerful, little tool. It can be quite captivating when you discover it for the first time.

That said, once you learn something like Perl or Python, which can do all AWK can do and a lot more, your love of AWK starts to fade. Even so, given the things the AWK engine does automagically, it still comes in handy from time to time.

For example, a Python program to count lines and words might look like this:

NR, NW = (0,0)
fp = open(filename, 'r')
for txt in fp:
    NW += len(txt.split())
    NR += 1
print NR, NW

Simple enough, but this is one place AWK is more expressive.

By the way: Technically, in the Unix work AWK is properly spelled awk, but it’s named after its authors, Alfred Aho, Peter Weinberger, and Brian Kernighan, so its name is an acronym. As such, for purposes of this post, the uppercase version works better for me.

Those who were around in the early MS-DOS era, especially in the corporate world, saw how people took to Lotus 1-2-3 (an early precursor to Excel, although not the first computer spreadsheet). I saw a lot of memos drafted on 1-2-3 just because you could format the text.

For those new to the computer world (a lot of secretaries, accountants, and office administrators), learning both Lotus 1-2-3 and a document editor — back then WordPerfect (or WordStar) — was serious overload.

So they often learned the one they had to learn (1-2-3) and eased off on the document editor because 1-2-3 could (pretty much) do the same thing.

So that’s the story on AWK hammers and nails.

It’s really a human tendency to use what we know (and know is — at least acceptably — effective). Often it’s only when our tools just don’t work for something that we seek out new ones.

On the flip side, the more tools you learn about, the less you become attached, or stuck, with one. So we can add tools as theory and foundation topics that make a good programmer!