Tags
computer programmer, computer programming, data, Java, software, software design, software development
Last time I started the story of my DataBridge application — a Java-based tool for transferring and transforming tabular data, such as TAB, CSV, and XML files. It could also read from and write to ODBC tables.
The app itself was just a framework that implemented a basic IPO model to transfer data. The details were up to the Input, Process (in this case, Mapping), and Output, drivers loaded at run time.
This architecture allowed the app to process any kind of data for which one could write a driver, so they could be written as needed down the line. An Input, Mapping, or Output, driver just needs to implement the respective interface. (Or it can extend an existing class and specialize its behavior.)
The app loads the new driver class by (package) name so long as it can find the compiled class (in places specified by the CLASSPATH
variable).
As you’ll see, the heart of DataBridge — what made it so useful — was the mapping that transformed input records to output records. (That interface is complex enough that I’ll either have to only briefly describe it, or devote a third post to it.)
§
For reasons having to do with reporting and persistence, many of my more complex object classes implement my IDisplayable
interface:
public interface IDisplayable { public String toString(); public String toHtml(); public String toXml(); } //eof
The toString()
method returns a printable string for reporting and logging. In general, you should always give your objects a “toString” method if the language supports it. It’s a great way to help with logging and testing.
The toHtml()
method is essentially the same thing as toString()
except that the returned string has HTML tags as appropriate (bold and italic, for instance). The intent is for when using a web interface rather than straight text.
The toXml()
method is intended for object persistence, storage, or logging. The returned string is expected to document the object appropriately. In some cases, it contains enough data to re-create the object instance.
Obviously the exact strings returned in all cases depend on the programmer and the use intended for what each object has to say.
§
Within the DataBridge package, the IPO classes implement IDataBridgeObject
, which extends IDisplayable
:
public interface IDataBridgeObject extends IDisplayable { public boolean err(); public String errmsg(); } //eof
All this does is add the notion of an error to those objects. If the err()
method returns true, then the errmsg()
method returns an error message. If the err()
method returns false, then errmsg()
returns an empty string.
§
For the input and output drivers, IDataChannel
defines the input/output interface:
public interface IDataChannel extends IDataBridgeObject { public void initialize(AppProperties props) throws Exception; public void open() throws Exception; public void close(); public boolean ready(); } //eof
The initialize()
method, which is called at startup, takes a properties object containing named properties the driver accesses to configure itself.
The open()
and close()
methods respectively start and stop the data transfer session. For example, with a disk file, these would literally open and close the file. With other forms of data, another process may be required.
The ready()
method is a context-dependent flag. For input drivers, it means data is available. For output drivers, it means data can be written.
§
All input drivers implement IDataInput
:
public interface IDataInput extends IDataChannel { static public final String GROUP_NAME = "Input"; public int getColumnCount() throws Exception; public String[] getColumnNames() throws Exception; public Object[] getRecord() throws Exception; } //eof
The getColumnCount()
method returns the number of columns in a data record.
Remember that DataBridge is tabular-data oriented. It expects data to come in tables with rows of repeating column patterns.
The getColumnNames()
method returns an array of strings that name the columns. The mapper uses this to identify the record fields.
The getRecord()
method is called so long as data records are available. It returns an array of objects, each object a record field. The order of the objects in the array matches the order of column names.
The DataBridge package came with a set of input drivers:
- NullInput: Essentially /dev/null
- TestInput: Hardcoded input for testing
- PropertiesInput: Read input directly from the config file
- FileInputTab: Read input from a TAB file
- FileInputCSV: Read input from a CSV file
- FileInputDelim: Read input from a character-delimited file
- QueryInput: Read input from an ODBC SQL query
Those covered nearly all forms of input, but the whole point was that new drivers were easy to create if necessary.
§
All output drivers implement IDataOutput
:
public interface IDataOutput extends IDataChannel { static public final String GROUP_NAME = "Output"; public void setColumnNames(String[] hdrs) throws Exception; public void putRecord(Object[] rcd) throws Exception; } //eof
The setColumnNames()
method takes an array of strings naming output columns.
The putRecord()
method takes an array of objects, each a data field. The order matches the order of the column names.
The package came with a set of output drivers:
- NullOutput: The bit bucket (/dev/null)
- TestOutput: For testing an output driver
- LoggerOutput: Write output to the log file (for testing)
- ConsoleOutput: Write output to the console
- FileOutputDelim: Write output to a character-delimited file
- FileOutputTab: Write output to a TAB file
- FileOutputCsv: Write output to a CSV file
- FileOutputXml: Write output to an XML file
- FileOutputXmlSimple: Write output to XML files
- FileOutputODBC: Write output to an ODBC table
Which, again, meet most needs, but new ones are easy.
§
The mapper interface itself isn’t that different from the input and output interfaces. All mapper drivers implement IDataMap
:
public interface IDataMap extends IDataBridgeObject { static public final String GROUP_NAME = "Mapper"; public void initialize(AppProperties props) throws Exception; public void open(IDataInput inchan, IDataOutput outchan) throws Exception; public void close(); public Object[] mapRecord(String[] cols, Object[] fields) throws Exception; } //eof
Note that this interface extends IDataBridgeObject
, not IDataChannel
, so it must include its own versions of the initialize()
method as well as the open()
and close()
methods.
The open()
method is different from the data channels in that it takes an input data channel and an output data channel.
The mapRecord()
method takes an array of fields names and an array of input data fields and returns an array of mapped output data fields.
The standard map drivers were:
- Null: Do no mapping; data passes straight through
- Auto: Map input columns to output columns
- Properties: Mapping defined in config file
- File: Mapping defined in external file
Which, yet again, meets most needs.
§ §
What makes the mapper the most complicated part is the IDataMapField
interface. The source file is almost 350 lines long. (And that’s after trimming the original for possible posting here.)
The reason is that the run-time protocol for defining a mapping is fairly powerful in terms of what it can do. Each map field had nine parameters that define how it handles data.
Basically, a map field is an output column. It can come from the input fields or any other source. For example, output columns can be generated from the current date, an increasing (or decreasing) sequence number, a hard-coded value, or some combination of the above.
The mapping system was also type aware. It could handle dates as dates and numbers as numbers (floats or integers). It even knew about phone numbers.
Of course it handled various kinds of delimited files as well as fixed-column files. (In corporate life, it’s amazing how many weird legacy fixed-column file formats there are. Many are hold-overs from punch-card days. DataBridge was a life-saver time and again when it came to these.)
At this point in my career, I’d been doing various kinds of data transformation tasks for many years, and I had a very good idea of what kinds of tools I wanted in my DataBridge.
§
I think I’ll leave it at that.
It’s a tool I’m very proud of. I put a lot of work into it, and it performed flawlessly. (You know, once I got the bugs out.)
What’s more, it really brought home the bacon for The Company. I’m sure they never really appreciated how much they benefited from what was basically a hobby project of mine — a tool I created to make my own life easier.
Yet, as I related last time, it ended up saving the corporate ass.
∅