Tags

, , , , , , ,

One characteristic of the hard-core coder is a love of computer languages, programming languages in particular. Programmers of that ilk — of my ilk — collect new languages like merit badges. (I get a kick out of saying that I’ve programmed “from Ada to the Z-80!”)

The especially far gone of us also enjoy creating new languages (or in some cases, new dialects of XML). Lately I’ve been playing with a language design that follows a favorite theme: the single-syntax construction language.

In a single-syntax construction language, there is only one statement, or construction, syntax. All statements in the language have the same form!

To start, consider how the statement syntax varies for different constructions:

int x = 42;
while (x < 100) {
    print(x);
}
for (x=0; x < 10; x++) {
    total += x < 5? foo[x] ; bar[x];
}

The above code fragment demonstrates five different syntax forms: variable definition, a while loop, a function call, a “for” loop, and an assignment. Not to mention the tertiary, increment, or array, operators.

For comparison, consider a language such as Lisp which has one basic syntactical form, the list:

(item1 item2 item3itemN)

An important variation of this list is the executable statement:

(keyword item1 item2itemN)

The keyword defines an action the Lisp engine takes with the items in the list. For example, to add two numbers:

(+ 42 21)

A nice benefit is that the same construct sums a list of numbers;

(+ 10 21 32 43 54 65 76 87 98)

In Lisp, the previous code fragment might look like this:

(var x 42)
(while (lt x 100) ((print x)))
(for ((set x 0), (lt x 10) (incr x))
    (set total (if (lt x 5) (index foo x) (index bar x))))

Now each part of the code has the same form: a list. Even expressions now come in list form.

All of which serves to introduce the single-syntax “LPL” (Little Programming Language) that I’ve been toying with lately. It’s intended as a query language for the baseball suite I’ve developed in Python since 2010.

Currently I use the suite to generate a lot of HTML pages for my baseball stats website. I wrote the suite ad hoc without a clear vision of what was possible or how to accomplish it. One result of that is query code mixed with presentation code.

That’s bad enough, but the suite also suffers from a need to re-query for similar or identical results for each page it generates. A better design would allow a single query to be the data source for multiple output pages. (And that automatically separates the query from the presentation.)

As part of a gradual re-factoring of the suite, I’ve been toying with the idea of a query language front-end for the query part of the suite. A sort of SQL for pyBBGD (Python BaseBall GameDay).

Today I just want to document what I think is a final take on the query language. In future posts I’ll write about the Python code that parses the language.

Each statement has the form:

[name:]keyword(description)

It’s a lot like Lisp, except that keyword is outside the parenthesis. The description is zero or more statements of the same form. Commas, which are treated as whitespace, may be used or not. Each keyword can have an optional name that can be used to refer to the object.

For example, a typical query might look like this:

name(DS1) use(MSB)
season(2014 regular)
filter(GameType(regular))
filter(GameStatus(final))

Which queries for all completed regular games during the regular 2014 season.

The following keywords are definitely defined:

  • name(string) – name for the dataset
  • use(keyword) – set dataset type (msb, box, event, pitch)
  • date([year] month day) – add one date to the query date list
  • dates(date1 date2) – add range of dates to the date list
  • month([year] month) – add range of dates to the date list
  • season([year] [regular]) – add range of dates to date list
  • season([year] post) – add range of dates to date list
  • season([year] keyword) – add single date to date list
  • year(year) – sets the default year (doesn’t add to date list)
  • filter(criteria) – sets a query filter; multiple filters AND together

The season keywords are:

  • first – first day of the season
  • last – last day
  • post_first – first day of post-season
  • post_last – last day
  • asg – date of the All-Star game
  • asb_first – last day before ASG
  • asb_last – first day after ASG

The year statement sets the default year (which is the current year otherwise). The year can be (optionally) explicitly specified in the date, month, and season, statements. The dates statement has two items, first-date and last-date, both of which are date or season statements (the latter of which can only return a single date).

I like the Lisp idea of lists, but I also like the function signature, f(). This syntax also has the advantage of doing away with strings as a distinct, marked, data type. The parentheses act as quote marks.

For example, in the following:

name(My Data Set) use(MSB) season(2014 post)

The MSB and post are keywords, while “My Data Set” is a string, and “2014” is a number. The implementations of name, use, and season, know what to look for.

In the event string constants are needed, the language can support the str(Hello, world!) object. If you wanted to get formal and treat all source text identically, you could also have an int(2014) object.

In general I like the idea of the keyword(content) syntax for configuration. It works well in a readable text file or on the command line (although statements with embedded spaces might need quoting in that case).

Next is extending the query language to provide some data massaging. For example, in a query for the whole season, I might want to group the results by month. This requires sums and averages (and more!) across subsets of the dataset.

At some point, as power grows, one can end up writing a fairly complete programming language (just implementing a rich set of expression operators is a good chunk of the way there). If things require that kind of power, then might as well do the Python coding in the first place.

The goal is that a very simple query (many of the generated pages use season-to-date stats) can be fed to simple data specifications that drive HTML templates. The keyword here is simple. If it can’t be kept simple, then it has to stay on the drawing board as just an interesting exercise.

The target exercise is an application that takes an HTML template, plus a set of simple query-configuration files, and produces a set of HTML pages. The pages will list a set of stats by season, monthly, ASG-split, and season-thirds.