Axiom/FriCAS Interpreter Tokeniser

This is part of some experimental code that I am writing to implement the FriCAS interpreter using SPAD code. For an overview of this experiment see page here. For information about how this is done using the current boot/lisp code see the page here.

Here we describe a scanner or tokeniser for our interpreter. This takes the input string holding the input line and converts it to a list of tokens.

How It Works

Each token, generated by this code, consists of a token type and a string with its acual value. For instance, if the token type is 'key' then the sting will hold the particular keyword such as: "macro".

Token Type	Meaning
id	identifier such as the name of a variable
key	keyword
integer	A numeric integer literal. If it is negative this will not be held in this token but there will be a '-' token preceeding it.
rinteger
float	This holds numeric values but it may also have '.' 'e' 'E' and '-' values. It is difficult to scan this as a single terminal value
string	any characters wrapped in double quotes.
comment
negcomment
error
spaces

This tokeniser is driven by a state table, as we scan across the input line this determines the next state depending on the character being scanned.

	Character Just Read
Current State		space	double quote	alphabetic	numeric	other
	init	space	string	sym	integ	op
	space	space	string	sym	integ	op
	string	string	init	string	string	string
	sym	space	string	sym	sym (symbol names can contain numeric values)	op
	integ	space	string	sym or float if 'e' or 'E'	integ	op or float if '.'
	float	space	string	sym or float if 'e' or 'E'	float	op
	op	space	string	sym	integ	op
	comment	comment	comment	comment	comment	comment

'comment' state is triggered if 'op' contains '--' or '++'.

Each time the state changes a new token is added to the list being generated.

In the case of errors a error token will be put in the token list. There is a function to scan the list for error tags. If this is true then the following stages of parsing need not be carried out and the error string can be displayed.

Testing It

We can try out the tokeniser in isolation by calling 'spadTokenise' from the existing interpreter. For information about downloading and compiling the code see this page.

(1) -> spadTokenise("1+2")

   (1)  [integer="1",key="PLUS",integer="2"]
                                                              Type: Tokeniser
(2) -> spadTokenise("1.0 + a3")

   (2)  [float="1.0",spaces=" ",key="PLUS",spaces=" ",id="a3"]
                                                              Type: Tokeniser
(3) -> spadTokenise("b2= -3")  

   (3)  [id="b2",key="EQUAL",spaces=" ",key="MINUS",integer="3"]
                                                              Type: Tokeniser

To Do

There are still some things to be fixed

(4) -> spadTokenise("b2=-3") 

   (4)  [id="b2",error="=-",integer="3"]
                                                              Type: Tokeniser
(5) -> spadTokenise("2e-6") 

   (5)  [float="2e",key="MINUS",integer="6"]
                                                              Type: Tokeniser

Need to be able to split non-alphanumeric symbols for example a=-3 only works if we put space between = and -
Floats not yet handled correctly for example 3e-5 would not work correctly

Next step

The output of this tokeniser is passed on to the parser as described on the page here.

Axiom/FriCAS Interpreter Tokeniser

How It Works

Testing It

To Do

Next step

EuclideanSpace

Computer Algebra Systems

Axiom/Fricas Information on this Site

metadata block

see also:
Correspondence about this page