Operator grammar

Programs and texts in other formal languages (such as specification languages) can often be made more readable by well-considered choice of notation. (Of course, they more often be made unreadable by an ill-considered choice.) It is common to support infix notation in some form, typically with a fixed stock of operators of fixed associativity and precedence level, but sometimes (as in the programming language Haskell Haskell's page) the author can invent and declare their own operators.

This note contains some thoughts about infix and bracketing operators like the following examples:

        a <| b |> c 
        a --[ b1, .. , bk ]--> c 
        [], [ b1, .. , bk ] ...

The first (ternary) operator was introduced by Hoare [`A couple of novelties in the propositional calculus' Zeitschr, f. math. Logik under Grundlagen d. Math 1985]] to represent conditional expressions. (In the forword to Hoare's CSP book, Dijkstra refers to Hoare as "notationally intrepid". I believe that the symbols <| and |> are sometimes known as "Toblerone brackets".) The middle operand is the boolean condition, the first one is the `then', and the last one is the `else'. Hoare's notation has among its merits that it makes it easier to spot opportunities for using algebraic laws of idempotence, associativity, symmetry, distributivity and so on.

The second notation is a generalisation of such Hoare-style split operators. The operator is in two `halves' --[ and ]--> . The two halves of the operator should be thought of as delimiting the scope of a nested grammar in which comma is a (say) right associative infix operator, of lowest precedence.

The third line shows Haskell's built in notation for lists. Here the left and right operands have disappeared, and we are left only with the material in the middle.

This is probably all about "punctuation" in some broad sense.

Grammar of nestable operator grammars

This is just a crude sketch.
      grammar ::= (operator declaration)*

      operator declaration
	      ::= "unary" ; symbol 
		| "infix" ; symbol ; admin 
		| "split" ; symbol ; grammar ; symbol ; admin 

      admin   ::= associativity ; precedence
		| precedence ; associativity

	      ::= "right associative" 
		| "left associative"
		| "not associative" 
		| nil

	      ::= "precedence" ; digit 
                | nil
Perhaps later grammar declarations should simply overide previous ones.


The associativity of an infix or split operator says how it "behaves" with respect to itself. It seems reasonable clear that there are only three kinds of associativity a sane person could want.


The precedence of an infix or split operator says how it "behaves" with respect to other infix or split operators. We seem to need at least two "layers", each having at least three precedence "strata" -- additive, multiplicative and exponential. Think of the usual syntax for boolean combinations of arithmetic equations. Here we have +,*,^ for expressions, then equality lower than all these, then +,*,^ for propositions. It seems reasonable to allow for 10 levels, although this is certainly rather `ad hoc'. This happens to be as in Haskell.

Section notation

Section notation is sometimes used (as in Haskell) in connection with infix operators. Richard Bird [??]
             (op e) = \ x -> x op e
             (e op) = \ x -> e op x
It can be very concise, and can help make important algebraic properties visible. Its abuse can also lead to hideous obfuscation.

How should section notation work for split operators? My inclination is to say it is fine, provided that the `middle' operands are all supplied. So one can use a section notation such as

              (<| cond |>)
              (<| cond |> default) 
              (patch <| cond |>) 
in connection with Hoare's operator (this appears to have been his intention).

What is the most general description of section notation?

Quotation symbols

One can use the same symbol for both the opening and closing symbols of a split operator, and it is natural to call this a quotation symbol. Perhaps one can say things like
          split " "
          split * infix | right 0 * 
Could one arrange for things as complicated as C's string and character syntax? This probably needs a very strong notion of `symbol'.

We might want to escape some external operators, perhaps by prefixing them with for example a common symbol.

Postfix? Superscripts? The whole shebang?

Let's not get carried away.

Haskell has only one, very special, prefix operator : `-' for minus.

Prefix. Where a leaf symbol `f' occur always at the head of a path of some specified length arity(f), we do not need brackets to represent binary trees - as in Polish notation. (This might become relevant as the global supply of parentheses dwindles.)

Postfix. Among postfix operators we have superscripts and subscripts, in which lowering and raising indicates a kind of bracketing. Apart from these `2-dimensional' things, it is striking how few postfix operators occur in nature. (Or is it? We have right sections. But they involve parentheses. Postfix notation for substitutions. `Algebraic' notation for application of functions.)

Indentation is a way of economising on parentheses (the dreadful expense of ink ...) by exploiting a two dimensional medium.

Juxtaposition is a very strange operator. It seems to play the role of a meta-operator - putting things `together' in a particular order. It needs no ink. It is the mother of all operators.

Variable binding notation

Variable binding operators - integral, derivative, sums and products, lambda notation, Bourbaki/de-Bruijn boxes and links. Bound variables

We use names in so many ways - as place holders, as constants, as links, as tags, as unknowns, .. .

Abadi and Cardelli have defined somewhere a lexically scoped extensible syntax that in some sense `copes with' not only names for bound variables but also names for selectors. (Maybe other uses of names too.)

Raanta (Type theoretical grammar).

Landin. sugar.

Peter Hancock
Last modified: Thu Aug 24 13:15:01 BST 2000