DataCollector Factories

Tags

computer programmer, computer programming, data, Java, software, software design, software development

Last time I introduced the DataCollector application, but didn’t have room to get into the use of factory classes. There isn’t often a need for a factory class, but they can be useful when you need to create objects at run-time without knowing their class until then.

The general approach involves a function that returns instances of a class based on run-time information. In some cases the instances are limited to a predetermined set of classes, in other cases it can any class the known to the code.

As a very simple example, suppose (in Python) we wanted a function that returns either a mutable list or an immutable list, depending on some run-time condition.

Here’s one way we might do it:

 def ListFactory (iterable, list_type):

     ”’Creates a list of a given type.”’

 

     if list_type.lower() == ‘mutable’:

         return list(iterable)

 

     if list_type.lower() == ‘immutable’:

         return tuple(iterable)

 

     raise RuntimeError(‘Unknown type: “%s”‘ % list_type)

 

 list_items = [1,2,3,4,5,6,7,8,9,10]

 

 list1 = ListFactory(list_items, ‘mutable’)

 list2 = ListFactory(list_items, ‘immutable’)



Note also that this is an example of a factory that provides an instance from a limited set of classes (just two, either list or tuple).

The idea is that, at run-time, something would provide the list type, either “mutable” or “immutable” (maybe from a configuration file), and this would determine the kinds of lists the code creates.

A slightly more involved version looks like this:

 def ListFactoryFactory (list_type=‘mutable’):

     ”’Returns a typed list factory.”’

 

     if list_type.lower() == ‘mutable’:

         return lambda xs:list(xs)

 

     if list_type.lower() == ‘immutable’:

         return lambda xs:tuple(xs)

 

     raise RuntimeError(‘Unknown type: “%s”‘ % list_type)

 

 factory1 = ListFactoryFactory()

 factory2 = ListFactoryFactory(‘immutable’)

 

 list_items = [1,2,3,4,5,6,7,8,9,10]

 list1 = factory1(list_items)

 list2 = factory2(list_items)



(Note that we’re providing a default list type here. We could have done the same in the code above.)

Here we have a “factory-factory” that returns a function. The returned function (which takes an iterable object) is a factory that makes lists. Depending on the factory type, it makes either a list or tuple given some iterable object.

This allows multiple lists (of the same type) to be created by the factory object. We can also pass the factory object to some other function that uses it to make lists. We can change the kinds of lists that function makes depending on what kind of factory we give it.

One could just pass the type string to the other function and let it use the type factory to get instances, but that means it has to know about the type factory class. By giving it only an object it uses in simple fashion, the other function is as isolated as possible from the factory system.

The above uses strings to steer the code to creating instances from predetermined classes. This requires the code to know about all the classes it might create instances from.

Some languages provide access to classes by name (by strings at run-time), which allows creation of instances based on class name strings. In this case, a type factory need not know all possible classes.

The DataBridge application used this capability (in Java). The desired Input, Mapper, and Output, driver classes are named in the configuration file. The application creates instances of those classes at run-time and “plugs them into” the data flow.

The DataCollector application uses the same Input, Mapper, and Output, driver class architecture, which loads classes by run-time strings, but it also uses factory architecture for the web services stuff.

There were several reasons for this:

Firstly, there are tools that take WSDL files as input and generate a series of Java classes that implement the web services requests and replies. These classes can include the request/response protocol, so they do a lot of the work necessary.

If there are six record types and five request types (see previous post) that means there are 30 sets of classes. “Sets of classes” because each request type of each record type has several classes associated with it (request, response, comm layer, etc). Creating these by hand would be tedious and difficult, so the conversion tools are important.

The end result is that there is an automation that binds the desired run-time actions with specific names of record types and fields.

Secondly, the whole point is an architecture that allows selection of these classes at run-time based on string input. Therefore, some sort of factory is necessary.

Thirdly, the potential of adding new record types or actions and only reconverting the WSDL files (which can be done with a single command). It allows the potential of extending the application’s scope without rewriting any code.

Or, as it turns out in the real world, without rewriting a lot of code.

Fourthly, flexibility. By deferring as much as possible to run-time, and by allowing new classes to be added without changing the existing application, it often happens the application has better longevity or scope than expected at design time.

Both DataBridge and DataCollector turned out surprising this way. Both more than paid for their development time. (Many times over, I’d say.)

Last time, I showed you a snippet that illustrates how DataCollector worked:

TypeFactoryFactory FF = new TypeFactoryFactory();

ITypeFactory TF = FF.getTypeFactory(record_type);
IQuery qry = TF.getQuery();
IRecord rcd = TF.getRecord();
//

This uses the factory-factory architecture. In fact, it takes it one step further in having the TypeFactoryFactory being itself the instance of a class.

That adds another layer of flexibility, since we could pass the code an instance of a different TypeFactoryFactory class so long as that class implemented or inherited the getTypeFactory() method.

For maximum flexibility, the TypeFactory, Query, and Record, types are all interfaces rather than classes. This allows use of any class hierarchy so long as the class implements the interface.

Putting it all together, it allows writing a small application like this:

TypeFactoryFactory FF = new TypeFactoryFactory();
ITypeFactory TF = FF.getTypeFactory(record_type);

// Create the query...
IRecord recd = TF.getRecord();

for (f in select_fields) {
    recd.setField(f.name, "");
}
for (f in where_fields) {
    recd.setField(f.name, f.value);
}
// Make the query...
IQuery qry = TF.getQuery();
qry.login(UserName, Password);

IRecord[] data = qry.get(recd);
qry.logout();

// Process the query result...
SortRecords(data, orderby_fields);

for (d in data) {
    print("Record:");
    for (f in select_fields) {
        print(d.getField(f.name));
    }
}

Again there is a punt on the sorting. (Assume a function that, given the array of records and a list of fields within those records, sorts the array using those fields.)

The code above assumes four inputs:

record_type: (string) The name of the table to query.
select_fields: (list) List of fields we want in the response.
where_fields: (list) List of fields with values for filtering.
orderby_fields: (list) List of fields for sorting.

(The field lists are, in all cases, pairs of strings with a name and value.)

This is very much the sort of information contained in an SQL query:

SELECT select_fields
FROM record_type
WHERE where_fields
ORDER BY orderby_fields

Which was the main goal of the project.

Here, as with DataBridge, I used property files as input to configure what the program did. Creating a query (say for the code above) was as simple as:

Entity = Account
Select = \
    AccountName, \
    Location, \
    Description, \
    Address.Address, \
    Address.City, \
    Address.Country
Filter = \
    [CreateDate] > '5/1/2007', \
    [Address.State] = 'MN'

Which is a lot like an SQL query. This would retrieve a list of Account records created since 5/1/2007 where the Address.State is Minnesota.

The same code could equally query for Contacts or any other record type and list any of the fields in the requested data.

Speaking of fields, the last aspect of the factory architecture is that I needed a third type of factory to generate instance fields.

The record types generally has a lot of fields — hundreds. A fully populated object was problematic in some cases, let alone an array of them. There was just no way to use fully defined record types.

So, instead, we define empty record types and, at run-time, add only the fields actually needed for the request. As the example above illustrates, most requests only use a handful of fields. The space savings is phenomenal.

This being Java, for every field we add, we also add a getter and setter function. The field name, and the access methods, follow a specific naming convention client code can anticipate based on the WSDL names.

Code that uses this library can access fields this way, but the preferred method (illustrated in the code above) are through using the getField(name) and setField(name,value) methods always present. These allow access to fields without knowing the field name ahead of time.

(The code above should make better sense now.)

This project turned out well. The irony is it was done under pressure after the first programmer given the project turned out not to be up to the job. I had to start fresh with clients already expecting a deliverable.

The double irony is it wasn’t the first time I stepped into a project that was floundered at the hands of someone incapable of the task and had to deliver results ASAP. Both turned out really well, so apparently I work well under pressure.

Nice to know. 😀

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

The Hard-Core Coder

~ I can't stop writing code!

DataCollector Factories

1 thought on “DataCollector Factories”

Over to you... Cancel reply

Share this:

Related

1 thought on “DataCollector Factories”

Over to you... Cancel reply