Tags
computer programmer, computer programming, data, Java, software, software design, software development
The last story was about PF.EXE, a file-processing utility I wrote for my own uses way back when. That one was a combination of C code and 8086 assembler, written for MS-DOS (worked fine in Windows), that read and wrote disc files. It had a toolkit of things it could do to them, depending on command line switches.
Many years later, using Java, I created more capable versions, the culmination of which was a suite called DataBridge. It turned out to be some of the most valuable work I ever did for The Company.
Shortly before I retired, there was a situation with one division I used to support that had a lot of customer data stored off-site in a third-party system (an early precursor to cloud computing). The original motivation was cost effectiveness, but vendor issues, accessibility issues, and capability issues, soured the relationship over time.
Eventually the powers decided to dump the whole thing. The problem was getting back all of that data. One of the issues was that it was challenging to use the vendor’s web services API to access the data.
Someday I’ll tell the story about the solution, DataCollector, I designed for them (and the co-worker who nearly made it a disaster). Essentially I created a system that worked a little bit like an SQL query with a web services back end. That allowed us to do the usual database operations (SELECT, INSERT, UPDATE, DELETE) with that online data.
Anyway, at that point my DataBridge tool, which I’d been developing for years, had grown to include input and output drivers for ODBC. It could read and write to our online databases.
About a month before I was to walk out the door for the final time, a group from that division came to me and asked me if there was any possible way I could help them retrieve all their data. They had no viable solution and were desperate.
Well, as it happens, by pairing DataCollector (to query the remote data and save it in TAB files) and DataBridge (to write TAB files into a corporate database), I could do what they needed in the month I had left. (I was also trying to get a final project out the door before I left, so it was an interesting final month.)
It was conversation where my first response was, “You want what? In a month?! Um…” But once I’d had a chance to think about it, I realized it was doable. They were pretty happy with the results.
§
As an aside, I’ve always viewed code — certainly my own — as something of a living thing that grows over time.
I’ve never really been one who, when a coding session breaks things, reverts to an older version and starts over. (I hate redoing work, for one thing.) I’m more prone to keep moving forward and fix what I broke.
I’m also prone to, when using one of my own tools and finding it wanting, interrupting what I’m doing to go tweak the tool to make it better. All my software tools have evolved slowly over time to become better and better.
§
Sometimes things require a major change or even a blank slate. DataBridge was definitely a blank slate.
It was written in Java, for one thing. Back then, that’s what corporate was using for most of its IT development. (There was a fair amount of J2EE, so it made sense.)
I’d given Java a look back when it first came out (and was mainly for little embedded web apps), but (as a C++ programmer) wasn’t terribly impressed. (I thought it was pretty funny how main()
had to be wrapped in a class, for one thing.)
But as Java got more sophisticated it became a very good language, and I grew to love it. It was nice to not worry about garbage collection and memory allocation! And I was utterly sold on object-oriented design (still am).
FTR: The first languages I used were BASIC and several varieties of assembler (including Knuth’s MIX). I migrated to C and C++, along with Visual BASIC, as I grew into serious programming. My final era at work involved a lot of Java, along with SQL and JavaScript. In my retirement, I mostly write Python code.
(In that mix there was also a short Perl era, a brief Lisp era (but, god, I loved Lisp), and an even briefer Smalltalk era. And I’ve stuck my toe into everything from Ada to Z80 assembler. I enjoy learning — and creating — new languages.)
§
DataBridge is a large body of Java code. Lots of classes (and I like to think it has lots of class 😉 ).
Because the suite was so capable, command line switches were out. Instead, the app took a task name and the filename of a configuration file that told it what to do. The task name called out a task specified in the file.
The configuration file was a standard Unix format that looked like this:
##============================================================ ## GLOBAL PROPERTIES ##============================================================ *.RecordCounter.MaxRecords = -1 *.Input.Driver = com.crma.databridge.input.DBQueryInput *.Input.Path = C:\\CRM\\projects\\J-CRM\\DataBridge\\data *.Input.Encoding = UTF-8 *.Input.HasHeader = Yes *.Input.ODBC = mdbODIM *.Input.Fields = * *.Mapper.Driver = com.crma.databridge.mapper.AutoMap *.Mapper.Path = C:\\CRM\\projects\\J-CRM\\DataBridge\\maps *.Mapper.TextFilter = TAB CR LF VT *.Mapper.NullReplace = *.Mapper.LogTheMap = No *.Output.Driver = com.crma.databridge.output.FileOutputTab *.Output.Path = C:\\CRM\\projects\\J-CRM\\DataBridge\\data *.Output.Filename = output_{!}.tab *.Output.Encoding = UTF-8 *.Output.UseBOM = No *.Output.UseHeader = No *.Output.LineEnd = CRLF *.Output.Delimiter = ~
The above global properties are defaults for “tasks” — there can be many tasks in one configuration file (to avoid having tons of config files).
A task is identified by a name in the first segment of the property name. The above are global properties because they use an asterisk. A task is configured like this:
##============================================================ ## TASK PROPERTIES ##============================================================ Main.Name = Main Task Main.Description = The task description. Main.Input.Driver = com.crma.databridge.input.FileInputFileSpec Main.Input.Path = C:\\CRM\\OnDemand\\ws\\maps Main.Input.Filespec = .*\\.map Main.Input.ODBC = obxTest Main.Input.SQL = instances Main.Input.Name = AprStats Main.Mapper.Driver = com.crma.databridge.mapper.NullMap Main.Mapper.LogTheMap = Yes Main.Mapper.Name = Main Main.Mapper.Filename = input.map Main.Mapper.TextFilter = TAB CR LF VT Main.Mapper.NullReplace = * #Main.Mapper.Cols = id,alias,[…],status,created,updated Main.Output.Driver = com.crma.databridge.output.FileOutputPipe Main.Output.ODBC = mdbDMOD Main.Output.Filename = output_{!}.tab Main.Output.UseHeader = Yes Main.Output.Table = instance Main.Output.FieldTypes = id:number,[…],pagesize:number
Which specifies a task named “Main” — the global properties could also specify the name of a default task to perform if none was provided in the command line.
§
These configuration samples show how DataBridge divides into three parts: Input, Mapper, and Output.
The task configuration names Java classes that the application plugs in as drivers to read (Input drivers), manipulate (Mapper drivers), and write (Output drivers) data.
This architecture allows writing new drivers for new situations. Any number of new drivers can be extended from the base classes or written to the interface specification. New classes can be added to the application’s JAR file or supplied in a new JAR file of their own.
It made for a very powerful tool — an early version of the more sophisticate (and rather expensive) enterprise tools that sprang up in the latter part of the 2010s.
There were two crucial aspects to this: The I/O drivers; and the Mapper drivers.
The I/O drivers read and wrote the data — not a huge deal. Input drivers grabbed data in whatever way was appropriate and presented it in a unified fashion (for the Mapper). Output drivers took unified data (from the Mapper) and wrote it in whatever way was appropriate.
Whereas PF.EXE only read and wrote disc text or data files, DataBridge not only knew about all sorts of disc files (data, text, CSV, TAB, XML, etc), it also did ODBC queries.
The big deal was the Mapper drivers. I wanted to be able to, for example, map a TAB-delimited input file to an XML output file. Or vice-versa. The idea was that any (tabular) input to any (tabular) output. Along with some raw data processing ability.
§
This has gotten long, so maybe I’ll pick it up another time.
∅