Tags

, ,

A long, long time ago I came up with a simple something I called Definition Language (DL) and an extension of that I called Data Definition Language (DDL). This was before XML (let alone JSON) became popular, and DL and DDL turned out to be somewhat akin to those.

My intention was a configuration language that would allow a data-dumping tool that knew the structure of the data it was dumping. Debuggers can sometimes do that in context. I wanted a tool that could do that with any file format given some DDL config file. (These days I’d probably just use XML.)

Definition Language is so simple, and uses such a common basic syntax that I can’t claim any credit for designing (let alone inventing) it. The basic — and only — unit, as the name suggests, is a {definition}:

{definition} := {tag} {name} {content} END {tag}

Note that the END keyword (the only one in DL) should not be case-sensitive. There is an abbreviated syntactical sugar form:

{definition} := {tag} {name} {content} ;

Which allows short definitions to not have to repeat the closing {tag}.

The {tag} and {name} components can be any string of characters that follows general variable-naming conventions. Exactly what those are is implementation dependent; DL makes no assumptions other than the strings can contain no spaces or DL quoting characters (which discussed below).

Definitions can be nested. The {content} component is defined:

{content} := {attributes} | {definitions}

The nesting comes from {definitions}, which is just:

{definitions} :=  <null> | {definition}{content}

The {attributes} are similarly defined:

{attributes} := <null> | {attribute}{content}

So {content} is zero or more {attribute}s and/or {definition}s in any order.

An {attribute} is defined:

{attribute} := {tag} : {name}

With {tag} and {name} following the same convention used for a {definition}.

Lastly, a comment is defined:

{comment} := {any-text} <EOL>

Comments may appear anywhere outside quoted text. They extend to the end of the line. They are the only aspect of DL that is line-sensitive (outside of quoted strings).

The final wrinkle is that a {name} can be several different types of a quoted string. This is mainly for use in attributes, but can be used in the {definition} {name} if desired. (It’s not considered good style, though.) The usual single- and double-quoting characters act as in any language — everything quoted is part of the string, including leading or trailing whitespace, tabs, and line-end characters.

DL also defines the < and > characters as matched quoting characters that strip leading and trailing whitespace but which also take special pains with recognizing line-endings. They may also be bound to special string processing. These are intended for embedded multi-line text content. (They’re similar to Python’s triple-quote feature.)

An optional extension: A {name} may contain embedded value references in the form %expression% where expression is anything the interpreter can resolve and evaluate (this is implementation dependent). In one simple form, expression is the name of some variable that’s in scope; environmental variables for instance. In more complicated forms, expression might be a math expression or contain function references.

And that’s DL.

§

Here’s an example of a DL file:

module Demo
   version: 1.0          -- attribute
   access: public        -- attribute
   -- define a 2D-point...
   class 2D-point
      integer X size:32 signed:TRUE value:0 ;
      integer Y size:32 signed:TRUE value:0 ;
      string tag value:"" ;
   end class
   -- define a 3D-point...
   class 3D-point
      inherit: 2D-point
      integer Z size:32 signed:TRUE value:0 ;
   end class
end module

It defines a 2D XY point and a 3D XYZ point that inherits the XY from 2D-point. Note there’s no problem with the hyphen in the names. DL doesn’t do math, so it’s never seen as anything but a name character.

One thing that stands out is the lack of methods. DL is about defining structure, and it doesn’t have a good facility for defining code. Or any text, really. Generally it’s bad at content, because the only content it can have is more nested definitions.

However, the < and > quoting is intended to allow some facility for text content:

class 2D-point
   integer X size:32 signed:TRUE value:0 ;
   integer Y size:32 signed:TRUE value:0 ;
   method Move
      input: x,y
      returns: self
      code:<
         self.x = x
         self.y = y
         return self
      >
   end method
end class

The angle-brackets can be replaced with other characters if desired. A common replacement is { and }. It depends a bit on what kind of content needs to be quoted. To preserve the use of single characters, use multi-character quoting (such as triple-quotes in Python). One simple extension is just << and >> (or even <<< and >>>, if necessary).

Such a system as above would no doubt have special handling for code attributes. All that leading space might get stripped off, for instance. It might parse and execute the code text itself or pass it to another interpreter depending on which programming language was used.

§

Data Definition Language is an extension of DL that specifies various {tag}s and {attribute}s for data dumping. For instance, it defines CATALOG, LIBRARY, FILE, and GROUP for file and definition management.

The data definitions are set of nested blocks with terminal definitions using one of the native data keywords: BYTE, SHORT, LONG, WIDE, SINGLE, DOUBLE, BITFIELD, BITVALUE, BITFLAG, DEF, REPEAT, SPAN.

The last three allow complex data definitions.

A simple DDL file might look something like:

CATALOG Demo
   LIBRARY Examples
      FILE blog-post-demo
         GROUP GIF
            BYTE Header
               count: 6
               require: < 47 49 46 38 39 61 >
            END BYTE
            SHORT Screen-Width  endian:little ;
            SHORT Screen-Height endian:little ;
            BYTE GCT-flag ;
            BYTE BG-Color ;
            BYTE PAR ;
            REPEAT Color-Table
               count: 256
               BYTE color count:3 ;
            END REPEAT
         END GROUP
      END FILE
   END LIBRARY
END CATALOG

Which defines the beginning of a GIF file. The various {name}s used in the definition would be available to some process that opened a GIF file and used this DDL to process it.

As mentioned above, the original intent was for structured data-dumping of arbitrary files given a DDL file describing them. Those {name}s become the labels of the displayed data. However some other application could use a DDL reader to parse and use arbitrary file formats.

§

I never formally locked down an official definition of DDL. It remained one of those ideas I never got around to pursuing. Lately it’s been popping into mind, and I’ve toyed with the idea of developing it at long last.

At least the idea of a structured data dumping tool. (Although, to be honest, my actual use for such a thing these days is pretty limited.) I might consider XML for the file definition language, although DDL is a bit cleaner to read. There is also that I’ve never been thrilled with XML’s content model, the mix of text and nested tags. It also always raises the question of where text goes, in an attribute or in tag content. It might be interesting to consider JSON, though.

Food for thought.

§ §

For now I just wanted to introduce DL and DDL. I may return to the subject later, especially DDL.

Until then, keep coding!

Ø