Eclipse Xtext Grammar- Martin Baker

This page follows on from some introductory information that I have put about Xtext here.

The main definition of the language is done by editing the file with the '.xtext' extension. For simple Xtext projects this may be the only file you need to configure. This file defines the following:

Lexer - This defines how the language is divided up into the components of the language such as keywords, operations, names and so on. I have put more about the lexer on this page.
Grammar - this defines the actual structure of the language, this is defined as a set of rules, more information below.
Mapping to Model - it is not enough to be able to recognise a grammar, we also need to map it onto an ecore model, the ecore model can then generate the language editor and so on. More information about the ecore model here and mapping the grammar to it on this page.

Grammar

The body of the Xtext grammar file is a list of rules, in the grammar file (which has an .xtext extension), written by the person implementing the DSL.

There is not space in this page to give a full explanation of the theory of language grammars, if you are not familiar with this theory you will need to learn about the following topics:

Xtext uses Antlr parser which uses LL(*) algorithm . This does not allow left recursive grammars (see this blog post by Sven Efftinge about how to avoid left recursion in expressions). Note XText does not support ANTLR4 which uses a new adaptive parsing strategy, this accepts any grammar and converts it to LL(*), unfortunately in XText we still have to avoid left recursion by hand.

One of the first decisions is to work out the primitives required and which of these are handled by the lexer.

Lexer

Terminal rules, each terminal rule returns an atomic type.

Syntax Element	Means
.	wild card
! 'x'	invert (not x)
('0'..'9')	character range

More information about the lexer on this page.

Grammar

Removing Left Recursion

Xtext (and antlr3) does not allow left recursion so we need to remove it.

Binary Operation Example

Here we look at removing left recursion in a binary operation rule, in this case '+'.

The following would be an intuitive way to define addition but it is left (and also right) recursive:

Expression:
  Atomic |
  left=Expression '+' right=Expression
;

where 'Atomic' is a number without '+' in it.

We can remove this left recursion in a right or left associative way as follows:

right associative:

Expression:
  left=Atomic ('+' right=Expression)?
;

where '?' suffix means one or none instances.

left associative:

Expression:
  left=Atomic ('+' right=Atomic)*
;

where '*' suffix means any number (zero or more) of instances.

Syntax Elements of Grammar

Syntax Element	Means
default (no operator)	exactly one
? suffix	one or none
* suffix	any (zero or more)
+ suffix	one or more
\| infix	or
& infix	unordered group

Cross References

Square brackets represent a cross reference, that is a non-containment reference.

Syntax Element	Means
[t\|n]	cross reference to an (already existing) object of type' t' via a name that has the syntax of 'n'

Where:

t=ReferencedEClass: type of referenced object
(not grammar rule but an EClass )
n=CrossReferenceTerminal: string returning grammar rule.
Used to cross reference to referenced object.

If the CrossReferenceTerminal is not explicitly included then it will default to ID, that is, [Type] is a shorthand for [Type|ID]

More about cross references on this page.

Syntactic Predicates

=> Means that this option can be parsed it should do so without considering any other alternatives.

example:

The traditional example for syntactic predicates is the dangling else-problem shown by this grammar:

'if' Expression (=> 'else' Expression)?

This tells Xtext that if it can parse an 'else' it should do so without considering any other alternatives.

I have discussed syntactic predicates further on page here.
There is also some useful information on this external reference.

Construction of AST

The grammar file not only defines the syntax of a text file, but also how it maps to an ECore model.

There are 4 types of rule.

Rule type	Returns	identified by	Used by
Parser Rules	EClass		Parser
Enum Rules	EEnum	start with 'enum'	Parser
Datatype Rules	EDatatype (EString is implied if none has been explicitly declared)	rules that do not: call other parser rules contain any actions contain assignments	Parser
Terminal Rules	EDatatype	start with 'terminal'	Lex

rule types

If we want a Syntax element to map to a model element then we should precede it with one of the following forms:

Syntax Element	Model Element
myName =
myName +=	returns list
myName ?=	returns true/false

Name Provider

The name provider is responsible for giving a name to an object so that it may be referenced. The following uses naming:

Cross referencing, such as when using the bracket syntax like: [Type|ID]
Used as default in label provider (affects outline view, hyperlinks and content proposals,)

The default behavior uses the following conventions:

If an object has a feature with a string-type, whose name is 'name', the default name provider implementation will use the value of that feature for calculating the name of an object.

Magic Names

Generally names can be anything except the following have special meaning:

Syntax Element	Means
name	Supports cross referencing. If an object has a feature with a string-type, whose name is 'name', the default name provider implementation will use the value of that feature for calculating the name of an object.
importURI	URI of a model file whose objects should be made visible. Supports URI import of a particular file like this: Import: 'import' importURI=STRING; Where STRING is a rule with string value
importedNamespace	A namespace import normally looks as follows: Import: 'import' importedNamespace=FqnWithWildCard; FqnWithWildCard: Fqn('.')?; Fqn:ID('.'ID)

Syntax Element

Means

name

Supports cross referencing. If an object has a feature with a string-type, whose name is 'name', the default name provider implementation will use the value of that feature for calculating the name of an object.

importURI

URI of a model file whose objects should be made visible. Supports URI import of a particular file like this:
Import: 'import' importURI=STRING;

Where STRING is a rule with string value

importedNamespace

A namespace import normally looks as follows:
Import: 'import' importedNamespace=FqnWithWildCard;
FqnWithWildCard: Fqn('.*')?;
Fqn:ID('.'ID)*

There is more information about how these are used on this page also on this Itimus blog.

question

Why does this grammar work?

Import returns EuclidImport :
'import' (
   importedNamespace=QualifiedName
  | importedNamespace=QualifiedNameWithWildCard) ';'?
;

But not this?

Import returns EuclidImport :
'import'  importedNamespace=QualifiedName ('.' '*')?  ';'?
;

Other Naming Conventions


ID
INT
DOUBLE
STRING	enclosed in single or double quotes

Actions

Enclosed in braces

Simple Actions

Just a name in braces, this creates a return type.

{BinaryExpression}

Regular Actions

creates a return type and also assigns what has been parsed so far (current) to some variable.

{BinaryExpression.left=current}

Expression: Atom ({BinaryExpression.left=current} op=('+'|'-') right=Atom)*

Debugging

Debugging parser code that has errors can be very difficult to debug.

An error such as:

"Decision can match input such as 'else' using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input"

May not be too bad but some errors don't give much information and can be hard to find. Usual program debugging techniques such as breakpoints don't work.

I find it helpful to generate syntax diagrams (railroad diagrams) to be able to see the syntax. but if that does not help then we may need to use ANTLRWorks (use the Ctrl-Shift-R shortcut).

See this blog for more information.

Customising

xtext structure

If our DSL cannot easily be parsed using the built-in capability of Xtext then it is possible to customise the lexer or parser.

For instance, Python-like whitespace block delineation and macroes can be difficult in Xtext, I have provided some information here to show how this can be done.

metadata block

see also:	Additional Issues Implementing a Python-like grammar. Tutorial for a Python-like grammar in your DSL. Implementing macros in your DSL
Correspondence about this page

Eclipse Xtext Grammar

Grammar

Lexer

Grammar

Removing Left Recursion

Binary Operation Example

Syntax Elements of Grammar

Cross References

Syntactic Predicates

example:

Construction of AST

Name Provider

Magic Names

Other Naming Conventions

Actions

Simple Actions

Regular Actions

Debugging

Customising

Further Reading

EuclideanSpace

Prerequisites

Related Software

Xgener