In the first two parts of this series I’ve introduced state engines and taken apart a specific instance of an engine. Now it’s time to tie together the design idea with approaches to building a variety of such engines.
Because the programming logic is in the state table, the engine can be fairly generic. That means it’s possible to create a state engine framework you can reuse for a variety of applications.
A state engine generally consists of three parts:
- The Driver Loop (the actual “engine”)
- The State Table
- The Process Methods & Variables
Typically, you implement the Driver Loop code fully within the framework. It’s the one piece that is entirely generic. The State Table is the opposite; other than the format of the table, all content is specific to the application. The Process Methods & Variables are also application-specific, but are often comprised of generic, reusable pieces.
I showed you Python code for a Driver Loop in both previous posts. Here’s a more generic version:
function StateEngine (sourceText, stateTable): state = ALPHA token = sourceText.nextToken() while token.isValid: state_record = stateTable.findState(state, token) if state_record.hasFunction: state_record.callFunction(token) state = state_record.nextState token = sourceText.nextToken() return (state,token)
while loop drives the process. The
sourceText object returns consecutive tokens from the input stream. Depending on the process, those tokens might be characters, words, or some other form of input token.
Each token drives a state transition from the current state to the next state. Crucially, the next state depends on both the current state and the token. The
findState() method uses those to find the matching record (row) in the State Table.
State transitions often have actions implicit in the transition. For example, you might need to add the current token to a buffer that’s accumulating a sequence of some sort (building a string, for example). The state record contains a function reference the engine invokes (or not if no function). Because those functions often need the current token, we supply it as a parameter.
Returning the current state and token allows the calling process to determine exactly what caused the engine to exit. The main difference being, did it end successfully on an EOT (end of text) or did it encounter a problem and exit early.
The State Table turns out to be the most dependent on the language. Some languages make it very easy, others make it more complicated. As we’ve seen, Python’s
dict data type makes it very easy. A language, such as C, lacking in forward reference and rich data types makes it more complicated.
For example, in C one might do something like this:
static int StateTable  = [ [STATE_ALPHA , SPC, STATE_ALPHA , NULL] , [STATE_ALPHA , '#', STATE_COMMENT, NULL] , [STATE_ALPHA , ';', STATE_COMMENT, NULL] , [STATE_ALPHA , ANY, STATE_TEXT , FTN_START_TEXT] , [STATE_COMMENT, EOL, STATE_ALPHA , NULL] , [STATE_COMMENT, ANY, STATE_COMMENT, NULL] , [STATE_TEXT , EOL, STATE_ALPHA , FTN_END_TEXT] , [STATE_TEXT , ANY, STATE_TEXT , FTN_ADDTO_TEXT] ]
All the UPPERCASE words are
enum values. The table is really an array of four-member sub-arrays of integers.
Each four-item array is a state record. The items are:
- Current State
- Token-Match Value
- Next State (if match)
- Transition Function
The table implements a very simple parser that reads a text stream as a series of lines. It ignores “comments” — lines that begin with a semi-colon (‘;’) or hash-sign (‘#’). It also strips leading spaces. (This version does not detect comments once a line of text begins.)
In this version, the Token-Match value is either a valid character (as an
int) or a special value (presumably out-of-band with regard to the set of valid character values). One might use negative values or positive integer values guaranteed not to represent valid character values.
There are three states:
- Alpha (start & default state)
- in Comment
- in Text
In this case, the entire table consists of pre-defined integer values or character values. Implementing and maintaining this design requires keeping those definitions current and in sync.
It also means the engine needs to translate those integer values or, at least, be aware of them. The current and next state values work great as integers, but the function and token-match values require translation of some kind.
You can use cross-reference tables to link defined values. One trick, if speed matters, is an initializer step you call first. This step actually modifies the table, converting defined values to function references or other immediately useful value.
Process Methods & Variables
Implementing the actual process code is also very language-specific. And it’s the part of the engine with the most potential variation in design approach.
In the C example table above, the process requires only three functions:
- Start buffering a line of text.
- Add another character to the buffer.
- End of line; process the buffer.
(Comments are ignored, so there is no need to buffer them.) These three functions are extremely common with state engines. The serial nature of token processing leads directly to the three actions above. It’s simply: beginning; middle; end.
And, obviously, a buffer (a variable) of some kind is required. You end up with something along these lines:
char buffer  int buffer_count = 0 function StartText (token): buffer = token buffer_count = 1 function AppendText (token): buffer[buffer_count] = token buffer_count = buffer_count + 1 function EndText (token): buffer[buffer_count] = 0 SendToOutput(buffer)
Note how the EndText() function doesn’t use the passed token. In this process, an EOL (end of line) token ends the text, but is not part of the text, so is not added to the buffer. The zero that is written acts as a string terminator.
SendToOutput() method is defined to handle the output strings of this process. Perhaps they are printed or saved to a file or parsed for commands.
How you implement all this is the most “dealer’s choice” part of things. In object-oriented languages, sometimes all you do is define a base class and then create a sub-class for every state engine.
Outside of OO languages, you often do end up implementing the Process code for each process. The tendency towards repeated actions allows some cut’n’paste from previous projects at the very least. Depending on the nature of your work, some sort of framework or reusable code may be possible.
Ultimately, what a State Engine does is break down a process in a very different way than you might be used to. It does take a slight shift in thinking to see your process as a set of small functions performed during state transitions.
(It’s possible those used to designing for the web may find this easier, as the request-response model of web interaction is usually founded on a state table. The big difference with a web design is the lack of the Driver Loop.)
That’s the end of this series for now. I may return to the topic to explore some of the details and variations in the future, but if you’re never encountered state engines before, you should have enough now to begin using them.