Tags
computer language, language design, little programming language, LPL, orthogonality, programming language, syntax
The language I showed you last time, in LPL-3, was a fairly reasonable one. This time I’m showing you a preposterous one no one would actually use. Worse, it turns out to be something of a failure due to weird holes left by the design goal of orthogonal single-syntax construction.
But that oddness helps us focus on what a programming language actually is, so it’s worth a peek. And maybe it’ll give you a laugh.
A reasonable question is: Why the focus on a single-syntax orthogonal design?
Those, by the way, are similar but not identical ideas.
A single-syntax design uses one lexical production to construct all possible statements in the language. XML is a wonderful example of a single-syntax language (although XML isn’t a programming language).
Single-syntax languages make a system easy to parse (a topic we’ll take up down the road). But, as you’ll see, a single-syntax can make a language verbose, noisy, and less usable. (Languages like XML and Lisp are exceptions where the single-syntax is a huge win.)
An orthogonal design requires that the parts of the syntax are all important and serve a similar function across different statements. This is a more subtle condition that a language (often, but not necessarily, single-syntax) can exhibit.
A way to think of it is as a matrix generated by working out all possible syntactical statements in the language. Each “slot” in the matrix is one way of using the syntax to create a command. An orthogonal language fills all the slots with valid, distinct, useful commands.
As a simple example, suppose a CPU implements a command set where every instruction has three parts and looks like this:
command register [ register-or-literal ]
If the command set is orthogonal then all available registers are legal in the second and third parts. Note that defining the third part as optional makes the system less orthogonal. (Effectively it means the length of a command depends on the command.)
Orthogonality makes a system easier to build, learn, and use. That said, it’s an abstract language purity goal often necessarily and appropriately sacrificed. It’s one of those things that makes you smile in appreciation when a designer pulls it off.
Anyway. Without further ado,…
Little Programming Language — 2
The single syntax for LPL-2 is:
statement := keyword ( expression ) { block }
The syntax rules are:
- A block is zero or more statements.
- If keyword is absent, do is implied.
- If expression is absent a value of 0 or “” is implied.
- If expression is absent a logical value of True is implied.
The way those last two rules contradict each other is the first hint of non-orthogonality. As you’ll see, it’s necessary to make selection statements work more intuitively. But usually, in language design, zero (0) and the empty string (“”) have a logical value of False.
Here are statement definitions for the usual suspects:
Unconditional Statement Block
{ block } do { block } not { block }
Perform the statements in block. The do
keyword is optional. The not
keyword can be used to stop a block of statements from executing (possibly useful in debugging).
Conditional Statement Block
( expression ) { block } do ( expression ) { block } not ( expression ) { block }
Perform the statements if block if expression is True. The do
keyword is optional. The not
keyword in this case reverses the logic (expression must be False to perform the block).
Explicit If-Else Statement
if ( expression } { block }
In this reduced form, the if
keyword works exactly like the do
keyword. That’s a huge non-orthogonal hole!
When two commands have identical effects, one of them is redundant (a violation of orthogonality) so a command is wasted (also a violation of orthogonality — one that can occur without redundancy — the not
keyword arguably wastes command slots).
if ( expression } { do { block } else ( expression ) { block } else ( expression ) { block } }
In this full form, the if
and else
keywords implement an If-ElseIf-Else construct. The do
keyword for the first inner statement is optional. At least one else
clause is required; others are optional. An empty expression in an else
acts as a True value.
Something to consider is whether the reduced form should be allowed. It requires a distinction between an if
block of statements and one with else
statements.
Another possibility is to do away with if
altogether and use do
and not
in clumsy serial fashion to implement If-Else:
do ( expression ) { true-statements } not ( expression ) { false-statements }
In both statements, expression would be the same. You might think that looks awfully clumsy (and it is), but that’s exactly what XSLT makes you do (and you don’t even get a not keyword — you use a pair of if statements with opposing logic).
While and Until Loop
while ( expression ) { block }
Do statements in block while expression is True. An empty expression creates an infinite loop.
until ( expression ) { block }
Do statements in block until expression is False. Here an empty expression creates a loop that never executes.
If you think any of these were weird (and, admittedly, yeah,… can’t deny it, but that was kinda the point), wait until you see:
Variables
The keyword is var. The basic idea is that the value(s) in expression is (are) bound to the name(s) found in block.
var (42) { the-answer } var ("hello!") { a-string } var (0,0,0) { x, y, z } var (0) { x, y, z } var () { x, y, z }
Which brings up a point I haven’t mentioned: expression can be a list of expressions. Considered as a logical value, a list of expressions is a union (logical AND) — all must evaluate True for the list to be True. Here they form a list of values to be bound to names.
The var block is a name or a list of names, depending on expression. If there are more names than values, the last value in the list is applied to the remaining names. In the example, x, y, z, are set to zero in all three cases.
If there are more expression values than names, the last name is bound to the list of remaining values:
var (x,y,z) { point } var (1:10) { range }
Part of the fun of language design is deciding what data types are native to your language. As you can see, so far we have numbers, strings, lists, and ranges. Space prohibits exploring this in detail here, and data type design is a separate and very complex topic.
Assignment (through the set keyword) works the same way, except that name must exist having first been defined via a var statement:
set (0) { x, y } set (21,42) { x, y } set (x,y) { point }
We’ll dodge the harmful GOTO by not defining anything like it, but that means we do need the final required member of the troupe…
Functions (Callable Code Units)
We can think of a function as defining a new keyword whose expression is the set of passed arguments and whose block contains the statements of the function. Let’s do that simply by using an at-sign (@) prefix to say define this keyword.
@adder (a, b) { return () {a+b} }
Which introduces the return
keyword (conditional returns… how cool is that?) and the fact that we’re apparently supporting ordinary math expressions (such as a+b, and, yes, we are — keyword constructs for operators would be ugly).
We can also suppose the language has some useful functions already defined (another language design choice is whether to make these native or part of some standard library):
print (stdout) { x, y, z } input (stdin) { keypress }
These work somewhat like the var
keyword in supporting lists of names in the block. The expression is meant to be an open file stream of some kind. (It’s possible the print
statement could support multiple streams, but it’s hard to see how the input
one could.)
The does raise the point that these functions are taking parameters in both the expression and block parts of the statement! That implies a function has access to the list of statements in the calling block. This is through the special @ array:
@sum (n) { not (exists n) { set(len @) {n} } var (0) {index,total} while (index < n) { set(total + @[index]) {total} } return () {total} }
The code assumes len
and exists
operators (not functions) that return the length of some object and test for an optional object, respectively.
An important point is the difference in the role of the expression in flow control statements versus its role in functions and variables. In the latter case, the expression is applied to the block. In the former case, it acts as a gate-keeper.
All that remains is some example code. Above is a function that sums a list of numbers (or the first n of a list). Here, once again, the (naive) Fibonacci routine:
@Fib (n) { (n < 2) { return(){n} } var (a) { Fib(n-1){} } var (b) { Fib(n-2){} } return() {a+b} }
For reference, the equivalent Python code:
def Fib (n): if n < 2: return n a = Fib(n-1) b = Fib(n-2) return a+b
That’ll do it for this time. Next time I’ll wrap up with another weird Little Programming Language, LPL-1.