Home - Parsing - Eval

eval2

Eval2 doesn't really do much more than eval1 did, but it handles long identifiers and numbers as well as just single letter identifiers. This principally requires a beefed-up lexical analyser but we also have minor changes within the parser itself just to cope with the longer tokens. I am making no assumptions about operators either, treating them as if they were multi-character tokens too.


Eval2 source.
Eval2 executable.

Syntax grammar 2
statement -> expression "\n"
expression -> term [ add_op term ]*
term -> factor [ mult_op factor ]*
factor -> primary | "(" expression ")"
mult_op -> "*" | "/"
add_op -> "+" | "-"
primary -> number | name
Lexical grammar 2
number -> digit [ digit ]*
name -> letter [ letter | digit ]*

A new lexer

If we continue to read the data in one character at a time, we may be digging a hole for ourselves later on. There is no guarantee that the statements we are concerned with will come from a file-oriented source. They may be read from a file or the keyboard, but they may be entered through a dialogue box as part of a windowed interface, so we will pass source code to the parser a line at a time. This also separates the input method from the parser, which sounds like a sensible thing to do anyway. This will cause us no trouble as long as we are only interested in one line expressions.

We want to chop our input up into 'tokens' which may be symbol names, operators or numbers. We must also classify the tokens as we read them in so that we know when we have reached the end of that type of token. For instance, a token which starts with a digit must be a number, and we can read characters until we have read one which is not a digit, but a token that starts with a letter must be a name, so we read characters until we have read one which is neither a letter nor a digit.

When reading in integers character by character, it is easy to convert the character stream into a value, but what about reading in floating point values? These are much more complex and by the time we know we need to read in a number we've already read in the first digit. It is too late then to call up the standard library function for reading in a floating point value. In C we can 'un-get' the character, putting it back onto the input stream, but this technique is certainly not portable to other languages. Reading the input a line at a time and parsing this line makes backtracking trivial in the lexer. This also fits in better with the idea of interpreting expressions entered through non-stream means such as a GUI dialogue box.

If we do use a library function to do some of our scanning, we have a problem knowing where to resume our own scanning. The C/C++ standard library will certainly return the number of characters scanned, but I'm rather doubtful about Pascal, Ada and BASIC. If the language you want to work in won't tell you how many characters it used when reading a number from a string, then you'll have to parse numbers yourself.

For the moment, we're not actually evaluating any expressions, so we don't need the value and we can handle numbers in a similar way to identifiers. Only the end-of-token condition is different.

Error recovery

We can already detect several syntax errors. These are handled - if we can call it that - by exiting the program. We need a way of handling these errors properly in future, printing a helpful error message and recovering from the error. This will come 'in the next release'.

Results:


$ type eval2.dat
(qwerty+asdfgh)*(123456-zxc78v)
qwerty+asdfgh*123456-zxc78v
(  qwerty   * asdfgh)* 123456-zxc78v
qwerty+asdfgh*(123456-zxc78v)
qwerty*qwerty*qwerty*qwerty
asdfgh+123456*(zxc78v+123456*qwerty*qwerty)*asdfgh+qwerty

$ eval2
 qwerty asdfgh + 123456 zxc78v - *
 qwerty asdfgh 123456 * + zxc78v -
 qwerty asdfgh * 123456 * zxc78v -
 qwerty asdfgh 123456 zxc78v - * +
 qwerty qwerty * qwerty * qwerty *
 asdfgh 123456 zxc78v 123456 qwerty * qwerty * + * asdfgh * + qwerty +

$

Home - Parsing - Eval
Valid HTML 4.01 Any comments or queries, write to me.
Site last updated 14 July, 2004