Extended Backus Naur Form - EBNF

There is a need to formulate programming languages very formally, in order toensure that they can be recognised by computer programs. To help with this a standard form of notation is used. The original format was adapted from the area of linguistics during the formulation of the programming language Algol 60 in 1958. To honour the leading figure in this effort, Peter Naur, the revised notation was renamed from Backus Normal form to Backus Naur Form. It is widely known as simply BNF.
What is Backus Naur Form?

How do we write our definitions of C and C++ elements

Most programming languages are today defined as having grammars written in some extension of BNF. The precise layout and use of meta-symbols varies. Here we consider an Extended BNF (EBNF) typical of the variants used to describe the C programming language family.

A grammar written in this way defines rules of syntax. These rules specify what sequences of symbols are allowed to occur in a legal program and in what ways these symbols may be combined. They do not specify how such sequences are to be interpreted. That requires semantic rules, which attach meaning to the syntactic structures of the grammar

All versions of BNF and EBNF have basically the same elements.

In the rules we write to define C and C++ in these notes, we will write as follows:
Terminal symbols
will be written in bold, black type;

will be written in italic type; on colour browsers they will appear in red;

The symbol separating
the non-terminal being defined and its definitions will be a space or, more usually, the start of a new line followed by increased indentation; thus
if-statement
       if ( expression ) statement
defines a new symbol, if-statement to be represented in a program as the keyword if, followed by the terminal symbol (, followed by a sequence which matches the definition of the non-terminal expression, followed by the terminal ), followed by a sequence matching the definition of a statement.

More than one pattern representing a particular symbol,
could be written in separate rules, each having the same defined symbol. It is simpler, however, to write them as a series of definitions on separate lines, all indented by the same amount more than the symbol they define; this allows definitions which are too long to fit on one line to continue on the next line, with extra indentation to show that they are not alternatives. Thus:
if-statement
         if ( expression ) statement
         if ( expression ) statement
               else statement
gives two rules for the appearance of an if-statement. The second of these is split over two lines, for readability.

Optional symbols
which can be included or omitted in a sequence, are indicated by a question mark, written in non-bold, roman type and following the optional symbol. It is necessary to make it clear when a symbol is not intended to form part of the sequence being described, but is merely part of a notation (meta-language) used to help in defining what can appear. On browsers with colout, this sort of meta symbol will additionally appear in green. Where a series of symbols must either all be present or all left out, they will be enclosed in parentheses, with the question mark following the closing parenthesis. Like the question mark these parentheses are meta-symbols and so will appear in the same format. Thus:
if-statement
      if ( expression ) statement ( else statement )?
defines exactly the same possibilities as the previous rule for an if-statement.

Repetition of a symbol
within a sequence is defined by writing the symbol followed by an asterisk, to mean that a sub-sequence matching that symbol should appear zero or more times. It is sometimes useful to use a plus sign instead, to mean "at least once and then as many times as are wanted". Both of these are meta-symbols and so are written in the format used previously for the option symbol. Again, a longer sequence may be indicated as being repeated by enclosing it in meta-language parentheses. Thus:
compound-statement
      {
            ( declaration )*
            ( statement )*
      }
gives the ANSI C (but not C++) syntax for a compound statement or block.
Back to the C notes index Back to the C++ notes index