Tags

, , ,

Programmers, like carpenters, are builders — we make things. The work can be for pay, but carpenters, for example, can build their own bookshelves and doghouses. Programmers also make software for themselves, sometimes to amuse, sometimes to provide a useful function.

A few of the apps I created for myself over the years turned out to be major workhorses for me — tools I used frequently. One of the earliest was PF.EXE, my Process File utility.

This is back in the late 1980s and early 1990s. I used, both at work and at home, PCs with MS-DOS. Windows 3.x happened in the early 1990s; by 1997 it was Windows95. Especially as a programmer, a lot of the work I was doing was at the MS-DOS level, not the Windows level.

Text has always figured heavily in both my hobby and professional work (especially the latter), so the need to manipulate text files in various ways was important. It became even more important when I began using Unix at work a lot.

One bane of computer life is how text files have different line-end protocols between the three major platforms. On Unix platforms, a line-end is a LINEFEED (LF) character (code value 10). On Apple platforms, a line-end is (or was) a CARRIAGE RETURN (CR) character (code value 13). And on MS-DOS, it’s both an LF followed by a CR.

[The modern equivalent based on what I see around the web is probably issues between the old CP-1252 text encoding and Unicode. I see a lot of ’ and — sequences where there should be and characters.]

Another common programmer annoyance involves TAB characters. Some (like me) avoid them like the plague; some deluded fools use them (some are oblivious to the difference and wonder why their indenting looks crazy sometimes). In any event, it’s nice to have a utility that converts TAB characters to some number of spaces. (A smart one can convert groups of leading spaces to TAB chars!)

§

The point is, programmers need to work like this with text files a lot. Sometimes we just want to count the words or lines of a file; sometimes we want to make changes. It’s an every day thing.

In the Unix world, there are lots of standard utilities that accomplish a lot of this. The awk and sed utilities, to name just two, offer a huge amount of power. (The wc utility exists solely and explicitly to count characters, words, and lines, in a file.)

But the MS-DOS world doesn’t have any of those tools, so I had to “roll my own” file processing tools. Since the need for them is so great, it was something I began doing early. By the 1990s it was standard practice.

Which brings me to PF.EXE, version 2.30, from 1992.

(I found an old ZIP file in an archive. I’m amazed I found this later version. The earliest versions are lost in the dusts of 3.5″ floppy history.)

§

Here’s the help screen you’d get with PF /?

PF v2.30, May 92 ==

 usage: PF infilename [outfilename] [switch(es)]

 Output defaults to stdout if outfilename not specified.

 For switches, the first two characters are required (switch names
 are not case-sensitive). Switches may appear anywhere on the on
 the command line, but processing follows command line order.

 Use / or - as switch char.

 Values can be: /SWX={number}, /SWX@{character}, /SWX:{string}
 Numbers can be: 0xhh (hex), ddd (decimal) or 0ooo (octal).

-HXdump    Hexidecimal Display
-LNumbr    Add Line Numbers
-CCtr      Count Characters

-XTabs[=X] Expand {TAB} to SPACES
-XCr       Expand {CR} to {CR}{LF}
-XLf       Expand {LF} to {CR}{LF}
-LFonly    Reduce {CR}{LF} to {LF}
-CRonly    Reduce {CR}{LF} to {CR}
-EOl=X     Convert {X} to {CR}{LF}
-CTrl=M    Control Chars -> ASCII
-GRfx=M    IBM GFX Chars -> ASCII
-TRan=X,Y  Translate {X} to {Y}
-STrip=X   Strip Char {X}
-ESc@C     Convert 'C' to {ESC}
-NAnsi     Remove ANSI sequences

-UCase     Convert to UPPER CASE
-LCase     Convert to lower case
-EBcdic    IBM EBCDIC to ASCII
-RCase     Reverse Case of Alphas
-ROT13     Rotate Alphas 13 Chars

-WRap=L    Wrap at L Columns
-BLock     80 Character Blocks
-CLip=F,T  Clip from F to T cols
-CUt=F,T   Cut from F to T lines
-COmpress  Remove Extra Spaces (>1)
-7Bit      Make Bit-8 Always 0
-ANd=X     AND With {X}
-OR=X      OR With {X}
-XOr=X     XOR With {X}
-ENcode:S  Encode Using String S

-FASC=L    Find ASCII Strings
-QUotes    Extract Quoted Text
-REms=m    Extract Code Comments
-SCan      Scan for () {} [] pairs
-CHar@C    Test For Char 'C'
-TEst=X    BIT TEST With {X}

/HDR       Add header/trailer
/OWrite    Overwrite output file
/APpend    Append to output file
/INline    Take input from stdin
/?Ctrl     Ctrl Modes Help Display
/DOts      Show 'work' dots
/IBuff=B   Set input buffer size
/OBuff=B   Set output buffer size
/?Grfx     Grfx Modes Help Display

So by this point (version 2.30), it was a pretty full-featured little utility. I’d been adding capabilities as I needed them for several years at this point.

§

Most of the actual processing routines were written in 808x assembler. I did a fair amount of that back in those days — it was the best way to have full command of the system. Only the program shell was in C.

Object-oriented C (and object-oriented assembler), no less! By the 1990s I was pretty serious about object-oriented design, and had found ways to use its techniques in plain old C (and in assembler).

FWIW, here’s the PF.C file in all its glory:

/****************************************************************************
 *****                                              PROCESS FILE II     *****
 *****                                                 version 2.30     *****
 *****                                                  May 3, 1992     *****
 ****************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <std.h>
#include <abend.h>
#include <inpargs.h>
#include "pf.h"

extern o_JobList    jobList         ;
extern o_Bridge     charBridge      ;
extern swx          aJobSwitches [] ;
extern emsg         aErrorMsgs []   ;
extern char*        linef []        ;
extern swxjob       SwxHelp         ;

#define HDR_DLINE   linef [0]
#define HDR_SLINE   linef [1]
#define HDR_TEXT    linef [2]
#define TLR_TEXT    linef [3]
/****************************************************************************
 *****                                                         DATA     *****
 ****************************************************************************/
ABEND*   pErrs     = NULL;  /* handle->oErrHandler  */
inpargs* pCmndLine = NULL;  /* handle->oCommandLine */

static cptr aFileNames [MAX_ARGNMS]; /* Table of Cmd Line Names */
static cptr pInFileName  = NULL;
static cptr pOutFileName = NULL;

void
main(int argc, char *argv [])
{
    indx dots=0; /* Next show is at: xxx  */

    /*-------------------------------------------------------*/
    /* Initialize the ErrorHandler and InputArgs objects.    */
    /*-------------------------------------------------------*/
    pErrs = INIT_ABEND(NULL, aErrorMsgs, NULL);
    pCmndLine = INIT_inpargs(NULL, aJobSwitches, MAX_ARGNMS, aFileNames);

    /*-------------------------------------------------------*/
    /* Process the command line (input args).                */
    /*-------------------------------------------------------*/
    pCmndLine->GetArgs(pCmndLine, argc, argv);
    if(isBAD(pCmndLine->status))
        ERR_EXIT(pCmndLine->status, pCmndLine->pErrText);

    if(aFileNames[0][0] EQU '?')
        SwxHelp(pCmndLine, X_HELP);
    /*-------------------------------------------------------*/
    /* Open Input and Output and initialize buffers          */
    /*-------------------------------------------------------*/
    if(SWXX_(pCmndLine, X_INLINE)) {
        pInFileName  = NULL;
        if(pCmndLine->iNbrNames EQU 0)
            pOutFileName = NULL;
        else
            pOutFileName = aFileNames[0];
    }
    else
        switch(pCmndLine->iNbrNames) {
        case 0:
            ERR_EXIT( E_NOARGS, NULL );
            break;
        case 1:
            pInFileName  = aFileNames[0];
            pOutFileName = NULL;
            break;
        default:
            pInFileName  = aFileNames[0];
            pOutFileName = aFileNames[1];
            break;
        }
    ifNOT(pOutFileName) {
        ifNOT(SWXX_(pCmndLine, X_OBUFF))
            charBridge.outBufBlk.iLength = 20L;
        ++SWXX_(pCmndLine, X_PAUSE);
    }
    else  {
        ++SWXX_(pCmndLine, X_HDR);
        ++SWXX_(pCmndLine, X_DOTS);
    }
    charBridge.Open(pInFileName, pOutFileName);

    jobList.Open();
    if(isBAD(jobList.status))
        ERR_EXIT(jobList.status, NULL);
    /*-------------------------------------------------------*/
    /* If /HDR switch is on, print the Header.               */
    /*-------------------------------------------------------*/
    if(SWXX_(pCmndLine, X_HDR)) {
        fputs(HDR_DLINE, stderr);
        fprintf(stderr, HDR_TEXT, (pInFileName?strupr(pInFileName):"stdin"));
        jobList.Print(stderr);
        fputs(HDR_SLINE, stderr);
    }
    /*-------------------------------------------------------*/
    Forever {
        /*---------------------------------------------------*/
        charBridge.Input();
        if(isBAD(charBridge.status))
            if(charBridge.status EQU E_EOF)
                break;
            else
                ERR_EXIT(charBridge.status, aFileNames[0]);
        /*---------------------------------------------------*/
        jobList.Exec(&(charBridge.pChr));
        if(isBAD(jobList.status))
            ERR_EXIT(jobList.status, NULL);
        /*---------------------------------------------------*/
        charBridge.Output(jobList.status);
        if(isBAD(charBridge.status))
            ERR_EXIT(charBridge.status, aFileNames[1]);
        /*---------------------------------------------------*/
        if(SWXX_(pCmndLine, X_DOTS) AND (++dots>255)) {
            fputc('.', stderr);
            dots=0;
        }
        /*---------------------------------------------------*/
        if(SWXX_(pCmndLine, X_PAUSE) AND (SpaceBarPause() EQU ESC))
            break;
        /*---------------------------------------------------*/
    }
    /*-------------------------------------------------------*/
    jobList.Close();
    if(isBAD(jobList.status))
        ERR_EXIT(jobList.status, NULL);

    charBridge.Close( );
    if(isBAD(charBridge.status))
        ERR_EXIT(charBridge.status, aFileNames[1]);
    /*-------------------------------------------------------*/
    /* If /HDR switch is on, print the Trailer.              */
    /*-------------------------------------------------------*/
    if(SWXX_(pCmndLine, X_HDR)) {
        fputs(HDR_SLINE, stderr);
        fprintf(stderr, TLR_TEXT,  charBridge.inBufBlk.nChars  ,
                                   charBridge.outBufBlk.nChars ,
                                   charBridge.inBufBlk.nLines  ,
                                   charBridge.outBufBlk.nLines );
        fputs(HDR_DLINE, stderr);
    }
    /*-------------------------------------------------------*/
    if(SWXX_(pCmndLine, X_DEBUG)) {
        fprintf(stderr, "Input buffer:  @%p, %4u bytes \n",
                charBridge.inBufBlk.pBuff, charBridge.inBufBlk.iLength);
        fprintf(stderr, "Output buffer: @%p, %4u bytes \n",
                charBridge.outBufBlk.pBuff, charBridge.outBufBlk.iLength);
        fprintf(stderr, "PAUSE: %i, INLINE: %i, DOTS: %i, HDR: %i, OWR: %i\n",
                aJobSwitches[X_PAUSE ].xToggle ,
                aJobSwitches[X_INLINE].xToggle ,
                aJobSwitches[X_DOTS  ].xToggle ,
                aJobSwitches[X_HDR   ].xToggle ,
                aJobSwitches[X_OWRITE].xToggle );
    }
    exit(0);
}
/*eof*/

I had some weird ideas about macros and Hungarian Notation back in those days. I eventually outgrew those insanities.

Also, FWIW, here’s one of the assembly routines:

NAME    PFLCASE
PAGE    60,132
%OUT Assembling: PF_LCASE.ASM
INCLUDE  pf.hdr
@CODE
ExecLCase LABEL NEAR
        push    si          ; ---\
        mov     si, sp      ; SI -> stack_frame
        mov     si, [si+4]  ; Point to The Byte.
        mov     al, [si]    ; Get The Byte.
        cmp     al, 'A'     ; If it's below an 'A':
        jb      LC_1        ; . skipit-->
        cmp     al, 'Z'     ; Or above a 'Z':
        ja      LC_1        ; . skipit-->
        xor     al, CAPS_M  ; Else: Toggle the "CAPS" bit.
        mov     [si], al    ; And re-write The Byte.
LC_1:  @Zero    ax          ; EXIT (WR).
        pop     si          ; ---/
        ret                 ; ***** RETURN *****
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
PostLCase LABEL NEAR
        lea     ax, sLCase  ; Point to id text string.
        jmp     PostJob     ; GO POST IT==>
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
_SwxLCase LABEL NEAR
        lea     ax, LCase   ; Point to our job.
        jmp     AddJob      ; GO ADD IT==>
PUBLIC _SwxLCase
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
@DATA
EVEN
LCase   job   <NULL, ExecLCase, PostLCase, NULL, FALSE, 0>
sLCase  db    "  lcase",0
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
@END

I kinda miss those days a little (but I mostly really don’t).

§

What’s most notable about this version of PF is that it abstracts the process and uses an object-oriented plug-in approach to processing files. There’s an overall framework that reads and writes the file, but what happens to the file data is dispatched to the selected routines.

I’ll pick up that idea in a future post.